Microbiome visualization

Last updated: 2021-07-03

Checks: 7 0

Knit directory: wildlife-bacteria/

This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: up-to-date

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(20210129)

The command set.seed(20210129) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

File paths: relative

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Repository version: af6ad5b

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version af6ad5b. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .DS_Store
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    analysis/.DS_Store
    Ignored:    analysis/_footer.html
    Ignored:    data/.DS_Store
    Ignored:    output/.DS_Store
    Ignored:    output/plots/.DS_Store
    Ignored:    output/plots/QC/.DS_Store
    Ignored:    output/plots/boxplots_select_taxa/.DS_Store
    Ignored:    output/plots/heatmaps/.DS_Store
    Ignored:    output/plots/maps/.DS_Store
    Ignored:    output/plots/tax_prev_abund/.DS_Store

Untracked files:
    Untracked:  ENA_docs/
    Untracked:  NCBI_data/
    Untracked:  data/dada2/
    Untracked:  data/dada2_tois/
    Untracked:  data/taxa_trees/
    Untracked:  data/tmp/
    Untracked:  output/beta-div-statistics.txt
    Untracked:  output/supp_table_pos.xlsx
    Untracked:  tmp/

Unstaged changes:
    Modified:   RData/amp_dec_bact.RData
    Modified:   RData/amp_raw_bact.RData
    Modified:   RData/ps_dec_bact.RData
    Modified:   RData/ps_raw_bact.RData
    Modified:   README.md
    Deleted:    analysis/site-map.Rmd
    Deleted:    analysis/tois.Rmd

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the repository in which changes were made to the R Markdown (analysis/microbiome-viz.Rmd) and HTML (docs/microbiome-viz.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File	Version	Author	Date	Message
Rmd	af6ad5b	siobhon-egan	2021-07-03	general update
html	e926c9d	siobhon-egan	2021-07-02	fix conflicts
html	33466dd	siobhon-egan	2021-07-01	update inc accession no
html	5f4c86d	siobhon-egan	2021-05-23	restructure pages
Rmd	0d602a3	siobhon-egan	2021-05-23	updates to pages
html	0d602a3	siobhon-egan	2021-05-23	updates to pages
html	9ecc095	siobhon-egan	2021-04-24	Build site.
html	a69dea3	siobhon-egan	2021-04-24	Build site.
Rmd	6ebcc5d	siobhon-egan	2021-04-24	updates
html	6ebcc5d	siobhon-egan	2021-04-24	updates
html	a0173d4	siobhon-egan	2021-02-26	Build site.
Rmd	d502188	siobhon-egan	2021-02-26	microbiome visualization

Visualization and analysis of microbiome data.

As this page contains extensive microbiome analysis and some parts are computationally heavy only a selection of figures have been included in the rendered webpage.

Note this was run using R version 4.0.3 and RStudo version 1.4. See R info for full details of R session.

See Phyloseq page for details on data cleaning.

0. Getting started

Install libraries if required

Only need to run this code once.

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
# phyloseq
source('http://bioconductor.org/biocLite.R')
biocLite('phyloseq')
#tidyverse
install. packages("tidyverse")
#ampvis2
install.packages("remotes")
remotes::install_github("MadsAlbertsen/ampvis2")
#ampvis2extras
install.packages("BiocManager")
BiocManager::install("kasperskytte/ampvis2extras")
#ggpubr
install.packages("ggpubr")
#agricolae
install.packages("agricolae")
install.packages("remotes")
remotes::install_github("DanielSprockett/reltools")
devtools::install_github('jsilve24/philr')
#decontam
BiocManager::install("decontam")
library(decontam)
BiocManager::install("FelixErnst/mia")
# Microbota process
remotes::install_github("YuLab-SMU/MicrobiotaProcess")

Load libraries

pkgs <- c("qiime2R", "phyloseq", "tidyverse", "ampvis2", "ampvis2extras", 
          "ggpubr", "agricolae", "plotly", "viridis", "cowplot", "MicrobeR", 
          "microbiome", "reshape", "decontam", "data.table", "ape", "DESeq2", 
          "vegan", "microbiomeutilities", "knitr", "tibble", "dplyr", 
          "patchwork", "Biostrings", "mia", "eulerr", "MicrobiotaProcess")
lapply(pkgs, require, character.only = TRUE)

Load data as imported and filtered as per phyloseq

# phyloseq raw data
load("Rdata/ps_raw_bact.RData")
# phyloseq decontam data
load("Rdata/ps_dec_bact.RData")

# ampvis2 raw data
load("Rdata/amp_raw_bact.RData")
# ampvis2 decontam data
load("Rdata/amp_dec_bact.RData")

Subset decontam objects to remove controls and create obj for each sample category Subset ampvis2 object based on sample category

# remove controls from phyloseq decontam data
ps_bact_samp = subset_samples(ps_dec_bact, !SampleType=="Control")
# sample category
ps_bact_bl = subset_samples(ps_bact_samp, SampleCategory=="Blood")
ps_bact_tis = subset_samples(ps_bact_samp, SampleCategory=="Tissue")
ps_bact_tick = subset_samples(ps_bact_samp, SampleCategory=="Tick")

# remove controls from ampvis2 decontam data
amp_samp <- amp_subset_samples(amp_dec_bact, 
                                 !SampleType %in% c("Control"),
                                 RemoveAbsents = TRUE)
# sample category
amp_bl<- amp_subset_samples(amp_samp,
                            SampleCategory %in% c("Blood"),
                            RemoveAbsents = TRUE)
amp_tis<- amp_subset_samples(amp_samp,
                            SampleCategory %in% c("Tissue"),
                            RemoveAbsents = TRUE)
amp_tick<- amp_subset_samples(amp_samp,
                            SampleCategory %in% c("Tick"),
                            RemoveAbsents = TRUE)

Set colours for Sample Category

ColSampCat = c("#7a255d", "#9fd0cb", "#7566ff")

1. Alpha diversity

1.1. Rarefaction plots

Plotting of rarefaction curves using ampvis2 - separating out by sample type. Use raw ASV count data for inspection of sequencing depth (using raw data here not decon data)

# Subset blood samples from raw data 
amp_raw_bl0 <- amp_subset_samples(amp_raw_bact, 
                              SampleCategory %in% c("Blood"),
                              minreads = 0,
                              RemoveAbsents = TRUE)
rarecurveblraw = amp_rarecurve(amp_raw_bl0, stepsize = 100)
rarecurveblraw1 = rarecurveblraw + ylab(label = "Number of ASVs") +
  theme(axis.text.x = element_text(angle = 45, size=10, vjust = 0.5)) + ggtitle("Blood") + theme_bw()
# Subset Tick samples form raw data 
amp_raw_tick0 <- amp_subset_samples(amp_raw_bact, 
                              SampleCategory %in% c("Tick"),
                              minreads = 0,
                              RemoveAbsents = TRUE)
rarecurvetickraw = amp_rarecurve(amp_raw_tick0, stepsize = 100)
rarecurvetickraw1 = rarecurvetickraw + ylab(label = "Number of ASVs") +
  theme(axis.text.x = element_text(angle = 45, size=10, vjust = 0.5)) + ggtitle("Tick") + theme_bw()
# Subset Tissue samples form raw data 
amp_raw_tis100 <- amp_subset_samples(amp_raw_bact, 
                              SampleCategory %in% c("Tissue"),
                              minreads = 100,
                              RemoveAbsents = TRUE)
rarecurvetisraw = amp_rarecurve(amp_raw_tis100, stepsize = 100)
rarecurvetisraw1 = rarecurvetisraw + ylab(label = "Number of ASVs") +
  theme(axis.text.x = element_text(angle = 45, size=10, vjust = 0.5)) + ggtitle("Tissue") + theme_bw()

# Merge into one figure
rarecurve <- ggarrange(rarecurveblraw1, rarecurvetickraw1, rarecurvetisraw1,
                    labels = c("A", "B", "C"),
                    ncol = 1, nrow = 3)
ggsave("rarecurve.pdf", plot = rarecurve, path = "output/plots", width = 25, height = 40, units = "cm")

1.2. Alpha diversity indexes

Make using alpha diversity plots with statistical values using microbiomeutilities

obs_alpha_plot <- plot_diversity_stats(ps_bact_samp, group = "SampleCategory", 
                            index = "observed",
                            group.colors = ColSampCat,
                            label.format="p.format",
                            stats = TRUE)
obs_alpha_plot = obs_alpha_plot + ylab("No. Observed ASVs") + xlab("") + labs(title = "Observed")
chao1_alpha_plot <- plot_diversity_stats(ps_bact_samp, group = "SampleCategory", 
                            index = "chao1",
                            group.colors = ColSampCat,
                            label.format="p.format",
                            stats = TRUE)
chao1_alpha_plot = chao1_alpha_plot + ylab("Chao1 Index") + xlab("") + labs(title = "Chao1")
shan_alpha_plot <- plot_diversity_stats(ps_bact_samp, group = "SampleCategory", 
                            index = "diversity_shannon",
                            group.colors = ColSampCat,
                            label.format="p.format",
                            stats = TRUE)
shan_alpha_plot = shan_alpha_plot + ylab("Shannon Index") + xlab("") + labs(title = "Shannon")
invsimp_alpha_plot <- plot_diversity_stats(ps_bact_samp, group = "SampleCategory", 
                            index = "diversity_inverse_simpson",
                            group.colors = ColSampCat,
                            label.format="p.format",
                            stats = TRUE)
invsimp_alpha_plot = invsimp_alpha_plot + ylab("Inverse Simpson Index") + xlab("") + labs(title = "Inverse Simpson")

#Merge all four alpha div plots into one figure
alphadiv_wp <- ggarrange(obs_alpha_plot, chao1_alpha_plot, shan_alpha_plot, invsimp_alpha_plot,
                    labels = c("A", "B", "C", "D"),
                    ncol = 2, nrow = 2)
#ggsave("alphadiv_withpvalues.pdf", plot = alphadiv_wp, path = "output/plots", width = 30, height = 30, units = "cm")
alphadiv_wp

EXTRA - Tick samples only

Repeat but just on tick data for supplementary information Make using alpha diversity plots with statistical values using microbiomeutilities

# subset tick species
ps_bact_tick_sub = subset_samples(ps_bact_tick, !tick_sp_short=="Ixhol;Ixtri")
tickspcols = c("#000004FF", "#781C6DFF", "#BB3754FF", "#ED6925FF", "#FCB519FF", "#c5cc00")

obs_alpha_plot_tk <- plot_diversity_stats(ps_bact_tick_sub, group = "tick_sp_short", 
                            index = "observed",
                            group.colors = tickspcols,
                            label.format="p.format",
                            stats = TRUE)
obs_alpha_plot_tk = obs_alpha_plot_tk + ylab("No. Observed ASVs") + xlab("") + labs(title = "Observed")
chao1_alpha_plot_tk <- plot_diversity_stats(ps_bact_tick_sub, group = "tick_sp_short", 
                            index = "chao1",
                            group.colors = tickspcols,
                            label.format="p.format",
                            stats = TRUE)
chao1_alpha_plot_tk = chao1_alpha_plot_tk + ylab("Chao1 Index") + xlab("") + labs(title = "Chao1")
shan_alpha_plot_tk <- plot_diversity_stats(ps_bact_tick_sub, group = "tick_sp_short", 
                            index = "diversity_shannon",
                            group.colors = tickspcols,
                            label.format="p.format",
                            stats = TRUE)
shan_alpha_plot_tk = shan_alpha_plot_tk + ylab("Shannon Index") + xlab("") + labs(title = "Shannon")
invsimp_alpha_plot_tk <- plot_diversity_stats(ps_bact_tick_sub, group = "tick_sp_short", 
                            index = "diversity_inverse_simpson",
                            group.colors = tickspcols,
                            label.format="p.format",
                            stats = TRUE)
invsimp_alpha_plot_tk = invsimp_alpha_plot_tk + ylab("Inverse Simpson Index") + xlab("") + labs(title = "Inverse Simpson")

#Merge all four alpha div plots into one figure
alphadiv_wp_tk <- ggarrange(obs_alpha_plot_tk, chao1_alpha_plot_tk, shan_alpha_plot_tk, invsimp_alpha_plot_tk,
                    labels = c("A", "B", "C", "D"),
                    ncol = 2, nrow = 2)
#ggsave("alphadiv_withpvalues_tk.pdf", plot = alphadiv_wp_tk, path = "output/plots", width = 30, height = 30, units = "cm")
alphadiv_wp_tk

2. Taxonomy

2.1. Taxa prevalence information

Using fast_melt() command to get an idea of most prevalent taxa and distribution

Plot Phylum - all samples

Melt phyloseq object to be able to group by taxa levels and filter for abundance and sample info.

# Using R code for taxa_summary.R function available here http://evomics.org/phyloseq/taxa_summary-r/
source("code/taxa_summary.R", local = TRUE)
##### All sample data
# Melt
mdt = fast_melt(ps_bact_samp)
# group by phyla
samp_phyla = mdt %>% 
  group_by(Phylum) %>% 
  summarize(sum = sum(count))

#### Blood samples
# Melt
mdt_bl = fast_melt(ps_bact_bl)
# group by family
bl_family = mdt_bl %>% 
  group_by(Family) %>% 
  summarize(sum = sum(count))

#### Tick samples
# Melt
mdt_tick = fast_melt(ps_bact_tick)
# group by family
tick_family = mdt_tick %>% 
  group_by(Family) %>% 
  summarize(sum = sum(count))

#### Tissue samples
# Tissue
mdt_tis = fast_melt(ps_bact_tis)
# group by family
tis_family = mdt_tis %>% 
  group_by(Family) %>% 
  summarize(sum = sum(count))

Make phyla plot for all samples

source("code/taxa_summary.R", local = TRUE)
# Melt
mdt = fast_melt(ps_bact_samp)
prevdt = mdt[, list(Prevalence = sum(count > 0), 
                    TotalCounts = sum(count)),
             by = TaxaID]
addPhylum = unique(copy(mdt[, list(TaxaID, Phylum)]))
# Join by TaxaID
setkey(prevdt, TaxaID)
setkey(addPhylum, TaxaID)
prevdt <- addPhylum[prevdt]
#Top 12 phyla
showPhyla = prevdt[, sum(TotalCounts), by = Phylum][order(-V1)][1:12]$Phylum
setkey(prevdt, Phylum)

Make plot for phyla

top_phyla1 = ggplot(prevdt[showPhyla], 
       mapping = aes(Prevalence, TotalCounts, color = Phylum)) + 
  geom_point(size = 2.5, alpha = 0.75) + 
  scale_y_log10() + theme_bw() + scale_color_viridis(discrete=TRUE) + ylab("Abundance") + labs(title = "Overall")
ggsave("top_phyla_all.pdf", plot = top_phyla1, path = "output/plots/tax_prev_abund", width = 15, height = 10, units = "cm")

Plot Phylum - by sample type

source("code/taxa_summary.R", local = TRUE)
### Blood
mdt_bl = fast_melt(ps_bact_bl)
prevdt = mdt_bl[, list(Prevalence = sum(count > 0), 
                    TotalCounts = sum(count)),
             by = TaxaID]
addPhylum = unique(copy(mdt_bl[, list(TaxaID, Phylum)]))
# Join by TaxaID
setkey(prevdt, TaxaID)
setkey(addPhylum, TaxaID)
prevdt <- addPhylum[prevdt]
# Top 12 phyla
showPhyla = prevdt[, sum(TotalCounts), by = Phylum][order(-V1)][1:12]$Phylum
setkey(prevdt, Phylum)
# Make plot for phyla
top_phyla_bl = ggplot(prevdt[showPhyla], 
       mapping = aes(Prevalence, TotalCounts, color = Phylum)) + 
  geom_point(size = 2.5, alpha = 0.75) + 
  scale_y_log10() + theme_bw() + scale_color_viridis(discrete=TRUE) + ylab("Abundance") + labs(title = "Blood")
ggsave("top_phyla_blood.pdf", plot = top_phyla_bl, path = "output/plots/tax_prev_abund", width = 15, height = 10, units = "cm")


### Tick
mdt_tick = fast_melt(ps_bact_tick)
prevdt = mdt_tick[, list(Prevalence = sum(count > 0), 
                        TotalCounts = sum(count)),
                 by = TaxaID]
addPhylum = unique(copy(mdt_tick[, list(TaxaID, Phylum)]))
# Join by TaxaID
setkey(prevdt, TaxaID)
setkey(addPhylum, TaxaID)
prevdt <- addPhylum[prevdt]
# Top 12 phyla
showPhyla = prevdt[, sum(TotalCounts), by = Phylum][order(-V1)][1:12]$Phylum
setkey(prevdt, Phylum)
# Make plot for phyla
top_phyla_tick = ggplot(prevdt[showPhyla], 
                       mapping = aes(Prevalence, TotalCounts, color = Phylum)) + 
  geom_point(size = 2.5, alpha = 0.75) + 
  scale_y_log10() + theme_bw() + scale_color_viridis(discrete=TRUE) + ylab("Abundance") + labs(title = "Tick")
ggsave("top_phyla_tick.pdf", plot = top_phyla_tick, path = "output/plots/tax_prev_abund", width = 15, height = 10, units = "cm")



### Tissue
mdt_tis = fast_melt(ps_bact_tis)
prevdt = mdt_tis[, list(Prevalence = sum(count > 0), 
                        TotalCounts = sum(count)),
                 by = TaxaID]
addPhylum = unique(copy(mdt_tis[, list(TaxaID, Phylum)]))
# Join by TaxaID
setkey(prevdt, TaxaID)
setkey(addPhylum, TaxaID)
prevdt <- addPhylum[prevdt]
# Top 12 phyla
showPhyla = prevdt[, sum(TotalCounts), by = Phylum][order(-V1)][1:12]$Phylum
setkey(prevdt, Phylum)
# Make plot for phyla
top_phyla_tis = ggplot(prevdt[showPhyla], 
                       mapping = aes(Prevalence, TotalCounts, color = Phylum)) + 
  geom_point(size = 2.5, alpha = 0.75) + 
  scale_y_log10() + theme_bw() + scale_color_viridis(discrete=TRUE) + ylab("Abundance") + labs(title = "Tissue")
ggsave("top_phyla_tissue.pdf", plot = top_phyla_tis, path = "output/plots/tax_prev_abund", width = 15, height = 10, units = "cm")

Merge all four phyla plots

top_phyla_merge <- ggarrange(top_phyla1, top_phyla_bl, top_phyla_tick, top_phyla_tis,
                    labels = c("A", "B", "C", "D"),
                    ncol = 2, nrow = 2)
ggsave("top_phyla_merge.pdf", plot = top_phyla_merge, path = "output/plots/tax_prev_abund", width = 30, height = 20, units = "cm")

Plot Family - blood samples

source("code/taxa_summary.R", local = TRUE)

mdt_bl = fast_melt(ps_bact_bl)
prevdt_bl = mdt_bl[, list(Prevalence = sum(count > 0), 
                    TotalCounts = sum(count)),
             by = TaxaID]
addFamily_bl = unique(copy(mdt_bl[, list(TaxaID, Family)]))
# Join by TaxaID
setkey(prevdt_bl, TaxaID)
setkey(addFamily_bl, TaxaID)
prevdt_bl <- addFamily_bl[prevdt_bl]
#Top 12 Family
showFamily_bl = prevdt_bl[, sum(TotalCounts), by = Family][order(-V1)][1:12]$Family
setkey(prevdt_bl, Family)

Make plot for phyla

top_Family_bl = ggplot(prevdt_bl[showFamily_bl], 
       mapping = aes(Prevalence, TotalCounts, color = Family)) + 
  geom_point(size = 2.5, alpha = 0.75) + 
  scale_y_log10() + theme_bw() + scale_color_viridis(discrete=TRUE)




ggsave("top_phyla.pdf", plot = top_phyla, path = "output/plots", width = 40, height = 15, units = "cm")

# Alternative plot
top_phyla3 = top_phyla1 + facet_wrap(~Phylum)
ggsave("top_phyla3.pdf", plot = top_phyla3, path = "output/plots", width = 40, height = 30, units = "cm")

2.2. Heatmap

2.2.1 ampvis2

Make frequency heat map using ampvis2

Aggregate taxa, showing 40 most abundant by Family species, facet by sample type

# Relative abundance
heatmap_rel <- amp_heatmap(amp_samp,
            facet_by = "SampleCategory",
            group_by = "species",
            tax_aggregate = "Family",
            tax_show = 40,
            normalise = TRUE,
            plot_values = FALSE,
            plot_values_size = 3,
            round = 0, color_vector = c("white", "#e5ba52", "#ab7ca3", "#9d02d7", "#0030bf"), plot_colorscale = "log10")  +
  theme(axis.text.x = element_text(angle = 45, size=10, vjust = 1),
        axis.text.y = element_text(size=10),
        legend.position="right")
heatmap_rel
# Count of sequences
heatmap_count <- amp_heatmap(amp_samp,
            facet_by = "SampleCategory",
            group_by = "species",
            tax_aggregate = "Family",
            tax_show = 40,
            normalise = FALSE,
            plot_values = FALSE,
            plot_values_size = 3,
            round = 0, color_vector = c("white", "#e5ba52", "#ab7ca3", "#9d02d7", "#0030bf"), plot_colorscale = "log10") + theme(axis.text.x = element_text(angle = 45, size=10, vjust = 1), axis.text.y = element_text(size=10), legend.position="right")

Save PDF of plots

ggsave("heatmap_rel.pdf", plot = heatmap_rel, path = "output/plots", width = 40, height = 25, units = "cm")
ggsave("heatmap_count.pdf", plot = heatmap_count, path = "output/plots", width = 40, height = 25, units = "cm")

Heat map with more detail for tick samples only. Aggregate taxa, showing 40 most abundant by Family species, facet by sample type

amp_tick <- amp_subset_samples(amp_samp, SampleCategory %in% c("Tick"))
amp_tick_sub <- amp_subset_samples(amp_tick,
                                    tick_sp_short %in% 
                                     c("Ixhol", "Ixtri", "Ixtas", "Ixant", "Ixaus", "Amtri"))
# Relative abundance
hmap_rel_tick <- amp_heatmap(amp_tick_sub,
            facet_by = "tick_sp_short",
            group_by = "instar",
            tax_aggregate = "Family",
            tax_show = 15,
            normalise = TRUE,
            plot_values = FALSE,
            plot_values_size = 3,
            round = 0, color_vector = c("white", "#e5ba52", "#ab7ca3", "#9d02d7", "#0030bf"), plot_colorscale = "log10")  +
  theme(axis.text.x = element_text(angle = 45, size=10, vjust = 1),
        axis.text.y = element_text(size=10),
        legend.position="right")
hmap_rel_tick

Save plot

ggsave("heatmap_rel_tick.pdf", plot = hmap_rel_tick, path = "output/plots", width = 40, height = 25, units = "cm")

2.2.2 Microbiome utlities

# Subset black rat and bandicoot
ps_bact_blsub = subset_samples(ps_bact_bl, species !="Quenda")
ps_bact_blsub = subset_samples(ps_bact_blsub, species !="Swamp rat")
# aggregate to family lebel
ps_bact_blsubF <- aggregate_taxa(ps_bact_blsub, "Family")

# create a gradient color palette for abundance
grad_ab <- colorRampPalette(c("#faf3dd","#f7d486" ,"#5e6472"))
grad_ab_pal <- grad_ab(10)

# create a color palette for varaibles of interest
meta_colors <- list(c("Black rat" = "#FFC857", 
                      "Brush tail possum" = "#05B083",
                      "Long-nosed bandicoot" = "#5e6472",
                      "Bush rat" = "blue",
                      "Chuditch" = "pink"),
                    c("Blood" = "#7a255d", 
                      "Tick" = "#9fd0cb", 
                      "Tissue"="#7566ff"))
# add labels for pheatmap to detect
names(meta_colors) <- c("species", "SampleCategory")

heatmap_2 <- plot_taxa_heatmap(ps_bact_blsubF,
                       subset.top = 20,
                       VariableA = c("species","SampleCategory"),
                       heatcolors = grad_ab_pal, #rev(brewer.pal(6, "RdPu")),
                       transformation = "log10",
                       cluster_rows = T,
                       cluster_cols = F,
                       show_colnames = F,
                       annotation_colors=meta_colors #, show_rownames = F)

2.3. Plotting selected taxa

Plotting selected taxa of interest following microbiomeutilities

Aggregate taxonomy to best family level and select taxa of interest

ps_bact_sampF <- aggregate_taxa(ps_bact_samp, "Family")
# Identify top 40 taxa
top_taxa(ps_bact_sampF, 40)
select.taxa <- c("Anaplasmataceae", 
                 "Midichloriaceae",
                 "Bacteria_Proteobacteria_Gammaproteobacteria_Legionellales_Coxiellaceae",
                 "Bartonellaceae",
                 "Mycobacteriaceae",
                 "Rickettsiaceae",
                 "Borreliaceae",
                 "Francisellaceae",
                 "Mycoplasmataceae")

Now plot selected taxa with statistical analysis

plot_select_taxa <- plot_listed_taxa(ps_bact_sampF, select.taxa, 
                      group= "SampleCategory",
                      group.colors = ColSampCat,
                      add.violin = TRUE,
                      violin.opacity = 0.3,
                      dot.opacity = 0.25,
                      box.opacity = 0.25,
                      panel.arrange= "wrap")

# Add statistical analysis
comps <- make_pairs(sample_data(ps_bact_sampF)$SampleCategory)
plot_select_taxa1 <- plot_select_taxa + stat_compare_means(
  comparisons = comps,
  label = "p.format",
  tip.length = 0.05,
  method = "wilcox.test",
  size = 3) + xlab("Sample type") + ylab("Relative abundance (%)") + scale_y_continuous(labels = scales::percent)

# Save plot as PDF
#ggsave("plot_select_taxa.pdf", plot = plot_select_taxa1, path = "output/plots", width = 30, height = 30, units = "cm")
plot_select_taxa1

Repeat process - this time among host species

# identify number of species
n_distinct(ps_bact_samp@sam_data$species)
# set colours
speciescol = viridis(10, option = "B")

# Make plot
plot_select_taxa_sp <- plot_listed_taxa(ps_bact_sampF, select.taxa, 
                      group= "species",
                      group.colors = speciescol,
                      add.violin = TRUE,
                      violin.opacity = 0.3,
                      dot.opacity = 0.25,
                      box.opacity = 0.25,
                      panel.arrange= "wrap")

# Add statistical analysis
comps <- make_pairs(sample_data(ps_bact_sampF)$species)
plot_select_taxa_sp1 <- plot_select_taxa_sp + stat_compare_means(
  comparisons = comps,
  label = "p.format",
  tip.length = 0.05,
  method = "wilcox.test",
  size = 3) + xlab("Species") + ylab("Relative abundance (%)") + scale_y_continuous(labels = scales::percent)

### For visualization saving option with and without p values
# Customise plot
plot_select_taxa_sp3 = plot_select_taxa_sp + theme(axis.text.x = element_text(angle=45, size=8, vjust = 0.5)) +  scale_x_discrete(labels = function(x) str_wrap(x, width = 10)) + xlab("Species") + ylab("Relative abundance (%)") + scale_y_continuous(labels = scales::percent)
plot_select_taxa_sp4 = plot_select_taxa_sp1 + theme(axis.text.x = element_text(angle=45, size=8, vjust = 0.5)) +  scale_x_discrete(labels = function(x) str_wrap(x, width = 10))

# Save plot as PDF
ggsave("plot_select_taxa_sp1.pdf", plot = plot_select_taxa_sp3, path = "output/plots", width = 30, height = 30, units = "cm")
ggsave("plot_select_taxa_sp2.pdf", plot = plot_select_taxa_sp4, path = "output/plots", width = 30, height = 30, units = "cm")

Now rotate through this to separate sample types and species

1. Blood

ps_bact_blF <- aggregate_taxa(ps_bact_bl, "Family")
# Identify top 40 taxa
select.taxabl <- c("Anaplasmataceae",
                 "Bacteria_Proteobacteria_Gammaproteobacteria_Legionellales_Coxiellaceae",
                 "Bartonellaceae",
                 "Mycobacteriaceae",
                 "Mycoplasmataceae")

# identify number of species
n_distinct(ps_bact_bl@sam_data$species)
# set colours
speciesblcol = viridis(7, option = "D")

# Now plot selected taxa with statistical analysis

plot_select_taxa_bl <- plot_listed_taxa(ps_bact_blF, select.taxabl, 
                      group= "species",
                      group.colors = speciesblcol,
                      add.violin = TRUE,
                      violin.opacity = 0.3,
                      dot.opacity = 0.25,
                      box.opacity = 0.25,
                      panel.arrange= "wrap")

# Add statistical analysis
comps <- make_pairs(sample_data(ps_bact_blF)$species)
plot_select_taxa_bl1 <- plot_select_taxa_bl + stat_compare_means(
  comparisons = comps,
  label = "p.format",
  tip.length = 0.05,
  method = "wilcox.test",
  size = 3) + xlab("Species") + ylab("Relative abundance (%)") + scale_y_continuous(labels = scales::percent)


### For visualization saving option with and without p values
# Customise plot
plot_select_taxa_bl3 = plot_select_taxa_bl + theme(axis.text.x = element_text(angle=45, size=8, vjust = 0.5)) +  scale_x_discrete(labels = function(x) str_wrap(x, width = 10)) + xlab("Species") + ylab("Relative abundance (%)") + scale_y_continuous(labels = scales::percent)
plot_select_taxa_bl4 = plot_select_taxa_bl1 + theme(axis.text.x = element_text(angle=45, size=8, vjust = 0.5)) +  scale_x_discrete(labels = function(x) str_wrap(x, width = 10))

# Save plot as PDF
ggsave("plot_select_taxa_bl1.pdf", plot = plot_select_taxa_bl3, path = "output/plots", width = 30, height = 20, units = "cm")
ggsave("plot_select_taxa_bl2.pdf", plot = plot_select_taxa_bl4, path = "output/plots", width = 30, height = 20, units = "cm")

2. Ticks

ps_bact_tickF <- aggregate_taxa(ps_bact_tick, "Family")
# Identify top 40 taxa
select.taxa.tick <- c("Anaplasmataceae", 
                 "Midichloriaceae",
                 "Bacteria_Proteobacteria_Gammaproteobacteria_Legionellales_Coxiellaceae", 
                 "Bartonellaceae",
                 "Mycobacteriaceae",
                 "Rickettsiaceae",
                 "Francisellaceae")

# identify number of species
n_distinct(ps_bact_tick@sam_data$species)
# set colours
speciestickcol = viridis(8, option = "D")

# Now plot selected taxa with statistical analysis

plot_select_taxa_tick <- plot_listed_taxa(ps_bact_tickF, select.taxa.tick, 
                      group= "species",
                      group.colors = speciestickcol,
                      add.violin = TRUE,
                      violin.opacity = 0.3,
                      dot.opacity = 0.25,
                      box.opacity = 0.25,
                      panel.arrange= "wrap")

# Add statistical analysis
comps <- make_pairs(sample_data(ps_bact_tickF)$species)
plot_select_taxa_tick1 <- plot_select_taxa_tick + stat_compare_means(
  comparisons = comps,
  label = "p.format",
  tip.length = 0.05,
  method = "wilcox.test",
  size = 3) + xlab("Species") + ylab("Relative abundance (%)") + scale_y_continuous(labels = scales::percent)


### For visualization saving option with and without p values
# Customise plot
plot_select_taxa_tick3 = plot_select_taxa_tick + theme(axis.text.x = element_text(angle=45, size=8, vjust = 0.5)) +  scale_x_discrete(labels = function(x) str_wrap(x, width = 10)) + xlab("Species") + ylab("Relative abundance (%)") + scale_y_continuous(labels = scales::percent)
plot_select_taxa_tick4 = plot_select_taxa_tick1 + theme(axis.text.x = element_text(angle=45, size=8, vjust = 0.5)) +  scale_x_discrete(labels = function(x) str_wrap(x, width = 10))

# Save plot as PDF
ggsave("plot_select_taxa_tick1.pdf", plot = plot_select_taxa_tick3, path = "output/plots", width = 30, height = 30, units = "cm")
ggsave("plot_select_taxa_tick2.pdf", plot = plot_select_taxa_tick4, path = "output/plots", width = 30, height = 30, units = "cm")

3. Tissue

ps_bact_tisF <- aggregate_taxa(ps_bact_tis, "Family")
# Identify top 40 taxa
select.taxa.tis <- c("Anaplasmataceae", 
                 "Midichloriaceae",
                 "Bacteria_Proteobacteria_Gammaproteobacteria_Legionellales_Coxiellaceae",
                 "Bartonellaceae",
                 "Mycobacteriaceae",
                 "Borreliaceae",
                 "Mycoplasmataceae")

# identify number of species
n_distinct(ps_bact_tis@sam_data$species)
# set colours
speciestiscol = viridis(9, option = "D")

# Now plot selected taxa with statistical analysis

plot_select_taxa_tis <- plot_listed_taxa(ps_bact_tisF, select.taxa.tis, 
                      group= "species",
                      group.colors = speciestiscol,
                      add.violin = TRUE,
                      violin.opacity = 0.3,
                      dot.opacity = 0.25,
                      box.opacity = 0.25,
                      panel.arrange= "wrap")

# Add statistical analysis
comps <- make_pairs(sample_data(ps_bact_tisF)$species)
plot_select_taxa_tis1 <- plot_select_taxa_tis + stat_compare_means(
  comparisons = comps,
  label = "p.format",
  tip.length = 0.05,
  method = "wilcox.test",
  size = 3) + xlab("Species") + ylab("Relative abundance (%)") + scale_y_continuous(labels = scales::percent)


### For visualization saving option with and without p values
# Customise plot
plot_select_taxa_tis3 = plot_select_taxa_tis + theme(axis.text.x = element_text(angle=45, size=8, vjust = 0.5)) +  scale_x_discrete(labels = function(x) str_wrap(x, width = 10)) + xlab("Species") + ylab("Relative abundance (%)") + scale_y_continuous(labels = scales::percent)
plot_select_taxa_tis4 = plot_select_taxa_tis1 + theme(axis.text.x = element_text(angle=45, size=8, vjust = 0.5)) +  scale_x_discrete(labels = function(x) str_wrap(x, width = 10))

# Save plot as PDF
ggsave("plot_select_taxa_tis1.pdf", plot = plot_select_taxa_tis3, path = "output/plots", width = 30, height = 30, units = "cm")
ggsave("plot_select_taxa_tis2.pdf", plot = plot_select_taxa_tis4, path = "output/plots", width = 30, height = 30, units = "cm")

More detailed info on Anaplasmataceae family and tick species

# Subset Ixodes holocyclus
Ixhol = subset_samples(ps_bact_tick, tick_sp_short == "Ixhol")
IxholF <- aggregate_taxa(Ixhol, "Family")
# Select just Anaplasmataceae
select.taxa_ana <- c("Anaplasmataceae")

# identify number of species
n_distinct(IxholF@sam_data$instar)
# set colours
instarcol = viridis(4, option = "D")

# Now plot selected taxa with statistical analysis

ixhol_ana <- plot_listed_taxa(IxholF, select.taxa_ana, 
                      group= "instar",
                      group.colors = instarcol,
                      add.violin = TRUE,
                      violin.opacity = 0.3,
                      dot.opacity = 0.25,
                      box.opacity = 0.25,
                      panel.arrange= "wrap")

# Add statistical analysis
comps <- make_pairs(sample_data(IxholF)$instar)
ixhol_ana1 <- ixhol_ana + stat_compare_means(
  comparisons = comps,
  label = "p.format",
  tip.length = 0.05,
  method = "wilcox.test",
  size = 3) + xlab("Instar") + ylab("Relative abundance (%)") + scale_y_continuous(labels = scales::percent)

### For visualization saving option with and without p values
# Customise plot
plot_select_taxa_bl3 = plot_select_taxa_bl + theme(axis.text.x = element_text(angle=45, size=8, vjust = 0.5)) +  scale_x_discrete(labels = function(x) str_wrap(x, width = 10)) + xlab("Species") + ylab("Relative abundance (%)") + scale_y_continuous(labels = scales::percent)
plot_select_taxa_bl4 = plot_select_taxa_bl1 + theme(axis.text.x = element_text(angle=45, size=8, vjust = 0.5)) +  scale_x_discrete(labels = function(x) str_wrap(x, width = 10))

# Save plot as PDF
ggsave("plot_select_taxa_bl1.pdf", plot = plot_select_taxa_bl3, path = "output/plots", width = 30, height = 20, units = "cm")
ggsave("plot_select_taxa_bl2.pdf", plot = plot_select_taxa_bl4, path = "output/plots", width = 30, height = 20, units = "cm")

2.4. Plotting top taxa

Plot top taxa using microbiomeutilities and include statistics to compare means

All samples

Plot for all samples and separate by sample type, include top 9 taxa (family level)

ColSampCat = c("#7a255d", "#9fd0cb", "#7566ff")
top_taxa_all <- plot_taxa_boxplot(ps_bact_samp,
                        taxonomic.level = "Family",
                        top.otu = 9, 
                        group = "SampleCategory",
                        add.violin= TRUE,
                        title = "Top nine family", 
                        keep.other = FALSE,
                        group.order = c("Blood","Tick","Tissue"),
                        group.colors = ColSampCat,
                        dot.size = 1)
#print(pn + theme_biome_utils())

# Statistical analysis (non-parametric)
comps <- make_pairs(sample_data(ps_bact_samp)$SampleCategory)
print(comps)
top_taxa_all1 <- top_taxa_all + stat_compare_means(
      comparisons = comps,
      label = "p.format",
      tip.length = 0.05,
      method = "wilcox.test", size = 3)
top_taxa_all2 = top_taxa_all1 + scale_y_continuous(labels = scales::percent) + theme_biome_utils() + theme(legend.position="none")

#Save plot
#ggsave("top_taxa_all2.pdf", plot = top_taxa_all2, path = "output/plots", width = 30, height = 30, units = "cm")
top_taxa_all2

Now plot separate sample categories and separate by species

1. Blood

# set colours
speciesblcol = viridis(7, option = "D")
top_taxa_bl <- plot_taxa_boxplot(ps_bact_bl,
                        taxonomic.level = "Family",
                        top.otu = 6, 
                        group = "species",
                        add.violin= TRUE,
                        title = "Blood", 
                        keep.other = FALSE,
                        group.colors = speciesblcol,
                        group.order = c("Black rat","Brush tail possum","Bush rat","Chuditch","Long-nosed bandicoot","Quenda","Swamp rat"),
                        dot.size = 1)

# Statistical analysis (non-parametric)
comps <- make_pairs(sample_data(ps_bact_bl)$species)
print(comps)
top_taxa_bl1 <- top_taxa_bl + stat_compare_means(
      comparisons = comps,
      label = "p.format",
      tip.length = 0.05,
      method = "wilcox.test", size = 3)
top_taxa_bl2 = top_taxa_bl1 + scale_y_continuous(labels = scales::percent) + theme_biome_utils() + theme(legend.position="none")


# Customise plot - make 2 (1 with and one without p values)
top_taxa_bl3 = top_taxa_bl + theme_biome_utils()
top_taxa_bl3 = top_taxa_bl3 + theme(legend.position="none", axis.text.x = element_text(angle=45, size=8, vjust = 0.5)) +  scale_x_discrete(labels = function(x) str_wrap(x, width = 10)) + xlab("Species") + ylab("Relative abundance (%)") + scale_y_continuous(labels = scales::percent)

top_taxa_bl4 = top_taxa_bl1 + theme_biome_utils()
top_taxa_bl4 = top_taxa_bl4 + theme(legend.position="none", axis.text.x = element_text(angle=45, size=8, vjust = 0.5)) +  scale_x_discrete(labels = function(x) str_wrap(x, width = 10)) + xlab("Species") + ylab("Relative abundance (%)") + scale_y_continuous(labels = scales::percent)

# Save plot as PDF
ggsave("top_taxa_bl1.pdf", plot = top_taxa_bl3, path = "output/plots", width = 30, height = 20, units = "cm")
ggsave("top_taxa_bl2.pdf", plot = top_taxa_bl4, path = "output/plots", width = 30, height = 20, units = "cm")

2. Tick

#set colours
speciestickcol = viridis(8, option = "D")
top_taxa_tick <- plot_taxa_boxplot(ps_bact_tick,
                        taxonomic.level = "Family",
                        top.otu = 6, 
                        group = "species",
                        add.violin= TRUE,
                        title = "Tick", 
                        keep.other = FALSE,
                        group.colors = speciestickcol,
                        group.order = c("Black rat","Brown antechinus", "Brush tail possum","Bush rat","Chuditch","Long-nosed bandicoot","Quenda", "Rabbit"),
                        dot.size = 1)

# Statistical analysis (non-parametric)
comps <- make_pairs(sample_data(ps_bact_tick)$species)
print(comps)
top_taxa_tick1 <- top_taxa_tick + stat_compare_means(
      comparisons = comps,
      label = "p.format",
      tip.length = 0.05,
      method = "wilcox.test", size = 3)
top_taxa_tick2 = top_taxa_tick1 + scale_y_continuous(labels = scales::percent) + theme_biome_utils() + theme(legend.position="none")


# Customise plot - make 2 (1 with and one without p values)
top_taxa_tick3 = top_taxa_tick + theme_biome_utils()
top_taxa_tick3 = top_taxa_tick3 + theme(legend.position="none", axis.text.x = element_text(angle=45, size=8, vjust = 0.5)) +  scale_x_discrete(labels = function(x) str_wrap(x, width = 10)) + xlab("Species") + ylab("Relative abundance (%)") + scale_y_continuous(labels = scales::percent)

top_taxa_tick4 = top_taxa_tick1 + theme_biome_utils()
top_taxa_tick4 = top_taxa_tick4 + theme(legend.position="none", axis.text.x = element_text(angle=45, size=8, vjust = 0.5)) +  scale_x_discrete(labels = function(x) str_wrap(x, width = 10)) + xlab("Species") + ylab("Relative abundance (%)") + scale_y_continuous(labels = scales::percent)

# Save plot as PDF
ggsave("top_taxa_tick1.pdf", plot = top_taxa_tick3, path = "output/plots", width = 30, height = 20, units = "cm")
ggsave("top_taxa_tick2.pdf", plot = top_taxa_tick4, path = "output/plots", width = 30, height = 20, units = "cm")

3. Tissue

speciestiscol = viridis(9, option = "D")
top_taxa_tis <- plot_taxa_boxplot(ps_bact_tis,
                        taxonomic.level = "Family",
                        top.otu = 6, 
                        group = "species",
                        add.violin= TRUE,
                        title = "Tissue", 
                        keep.other = FALSE,
                        group.colors = speciestiscol,
                        group.order = c("Black rat","Brown antechinus", "Brush tail possum","Bush rat","Chuditch","Deer", "Long-nosed bandicoot","Quenda", "Rabbit", "Swamp rat"),
                        dot.size = 1)

# Statistical analysis (non-parametric)
comps <- make_pairs(sample_data(ps_bact_tis)$species)
print(comps)
top_taxa_tis1 <- top_taxa_tis + stat_compare_means(
      comparisons = comps,
      label = "p.format",
      tip.length = 0.05,
      method = "wilcox.test", size = 3)
top_taxa_tis2 = top_taxa_tis1 + scale_y_continuous(labels = scales::percent) + theme_biome_utils() + theme(legend.position="none")


# Customise plot - make 2 (1 with and one without p values)
top_taxa_tis3 = top_taxa_tis + theme_biome_utils()
top_taxa_tis3 = top_taxa_tis3 + theme(legend.position="none", axis.text.x = element_text(angle=45, size=8, vjust = 0.5)) +  scale_x_discrete(labels = function(x) str_wrap(x, width = 10)) + xlab("Species") + ylab("Relative abundance (%)") + scale_y_continuous(labels = scales::percent)

top_taxa_tis4 = top_taxa_tis1 + theme_biome_utils()
top_taxa_tis4 = top_taxa_tis4 + theme(legend.position="none", axis.text.x = element_text(angle=45, size=8, vjust = 0.5)) +  scale_x_discrete(labels = function(x) str_wrap(x, width = 10)) + xlab("Species") + ylab("Relative abundance (%)") + scale_y_continuous(labels = scales::percent)

# Save plot as PDF
ggsave("top_taxa_tis1.pdf", plot = top_taxa_tis3, path = "output/plots", width = 30, height = 20, units = "cm")
ggsave("top_taxa_tis2.pdf", plot = top_taxa_tis4, path = "output/plots", width = 30, height = 20, units = "cm")

3. Abundance-Prevalence relationship

Compare abundance vs prevalence for top taxa from Long-nosed bandicoot blood samples microbiomeutelities

ps_bact_bl = subset_samples(ps_bact_samp, SampleCategory == "Blood")

asv_ps <- microbiome::transform(ps_bact_bl, "compositional")
# Select healthy samples
asv_ps <- subset_samples(asv_ps, species=="Long-nosed bandicoot")
asv_ps <- core(asv_ps,detection = 0.001, prevalence = 0.05) # reduce size for example
asv_ps <- format_to_besthit(asv_ps)

set.seed(14285)
p_v <- plot_abund_prev(asv_ps, 
                       label.core = TRUE,
                       color = "Phylum", # NA or "blue"
                       mean.abund.thres = 0.01,
                       mean.prev.thres = 0.99,
                       dot.opacity = 0.7,
                       label.size = 3,
                       label.opacity = 1.0,
                       nudge.label=-0.15,
                       bs.iter=9, # increase for actual analysis e.g. 999
                       size = 29, # increase to match your nsamples(asv_ps)
                       replace = TRUE,
                       label.color="#5f0f40")  
prev_abund <- p_v + 
  geom_vline(xintercept = 0.25, lty="dashed", alpha=0.7) + 
  geom_hline(yintercept = 0.005,lty="dashed", alpha=0.7) +
  scale_color_viridis(discrete=TRUE)
prev_abund

4. Venn diagram

Venn diagram

4.1. Compare `SampleCategory`

Make venn diagram to show overlap between blood, tick and tissue samples

# how many samples in each category
table(meta(ps_bact_samp)$SampleCategory, useNA = "always")
# transform data
ps_bact_samprel <- microbiome::transform(ps_bact_samp, "compositional")
# assign best hit
ps_bact_samprel.f <- format_to_besthit(ps_bact_samprel)
# Make a list of SampleCategory
sample_cat <- unique(as.character(meta(ps_bact_samprel.f)$SampleCategory))
print(sample_cat)

Format to best hit and then Re-Write a for loop to go through each of the disease_states one by one and combine identified core taxa into a list. Loop through for three different abundance/prevalence situations

option a

list_core <- c() # an empty object to store information

for (n in sample_cat){ # for each variable n in SampleCategory
    #print(paste0("Identifying Core Taxa for ", n))
    
    ps.sub <- subset_samples(ps_bact_samprel.f, SampleCategory == n) # Choose sample from SampleCategory by n
    
    core_m <- core_members(ps.sub, # ps.sub is phyloseq selected with only samples from g 
                           detection = 0.001, # 0.001 in at least 50% samples 
                           prevalence = 0.05)
    print(paste0("No. of core taxa in ", n, " : ", length(core_m))) # print core taxa identified in each SampleCategory
    list_core[[n]] <- core_m # add to a list core taxa for each group.

    #print(list_core)
}
vennpl1 = plot(venn(list_core),
     fills = ColSampCat)

option b

list_core <- c() # an empty object to store information

for (n in sample_cat){ # for each variable n in SampleCategory
    #print(paste0("Identifying Core Taxa for ", n))
    
    ps.sub <- subset_samples(ps_bact_samprel.f, SampleCategory == n) # Choose sample from SampleCategory by n
    
    core_m <- core_members(ps.sub, # ps.sub is phyloseq selected with only samples from g 
                           detection = 0.001, # 0.001 in at least 0.1% samples 
                           prevalence = 0.001)
    print(paste0("No. of core taxa in ", n, " : ", length(core_m))) # print core taxa identified in each SampleCategory
    list_core[[n]] <- core_m # add to a list core taxa for each group.

    #print(list_core)
}
vennpl2 = plot(venn(list_core),
     fills = ColSampCat)

option c

#formt to best hit
list_core <- c() # an empty object to store information

for (n in sample_cat){ # for each variable n in SampleCategory
    #print(paste0("Identifying Core Taxa for ", n))
    
    ps.sub <- subset_samples(ps_bact_samprel.f, SampleCategory == n) # Choose sample from SampleCategory by n
    
    core_m <- core_members(ps.sub, # ps.sub is phyloseq selected with only samples from g 
                           detection = 0.0001, # 0.0001 in at least 50% samples 
                           prevalence = 0.05)
    print(paste0("No. of core taxa in ", n, " : ", length(core_m))) # print core taxa identified in each SampleCategory
    list_core[[n]] <- core_m # add to a list core taxa for each group.

    #print(list_core)
}
vennpl3 = plot(venn(list_core),
     fills = ColSampCat)

Merge into one figure

vennplot <- ggarrange(vennpl1, vennpl2, vennpl3,
                    labels = c("A", "B", "C"),
                    ncol = 3, nrow = 1)
#ggsave("vennplot.pdf", plot = vennplot, path = "output/plots", width = 25, height = 10, units = "cm")
vennplot

4.2. Compare `species` by `SampleCategory`

Set colours so they are the same for each species

# Use viridis package to retrieve 4 colours for species
viridis(4, option = "D")
# Then lighten colours using https://projects.susielu.com/viz-palette
spcols <- c("Black rat"="#723281", "Brush tail possum"="#6396BE", "Chuditch"="#6EEAA8", "Long-nosed bandicoot"="#FFFF63")

BLOOD

Make venn diagram to show overlap top species for blood samples.

# how many samples in each category
table(meta(ps_bact_bl)$species, useNA = "always")
# subset species
blood_sub <- subset_samples(ps_bact_bl, species %in% c("Black rat", "Brush tail possum", "Chuditch", "Long-nosed bandicoot"))
# transform data
blood_subrel <- microbiome::transform(blood_sub, "compositional")
# assign best hit
blood_subrel.f <- format_to_besthit(blood_subrel)
# Make a list of SampleCategory
species_cat <- unique(as.character(meta(blood_subrel.f)$species))
print(species_cat)

Format to best best and then Re-Write a for loop to go through each of the disease_states one by one and combine identified core taxa into a list. Loop through for three different abundance/prevalence situations - using option a patameters

list_core <- c() # an empty object to store information

for (n in species_cat){ # for each variable n in SampleCategory
    #print(paste0("Identifying Core Taxa for ", n))
    
    ps.sub <- subset_samples(blood_subrel.f, species == n) # Choose sample from SampleCategory by n
    
    core_m <- core_members(ps.sub, # ps.sub is phyloseq selected with only samples from g 
                           detection = 0.001, # 0.001 in at least 50% samples 
                           prevalence = 0.05)
    print(paste0("No. of core taxa in ", n, " : ", length(core_m))) # print core taxa identified in each SampleCategory
    list_core[[n]] <- core_m # add to a list core taxa for each group.

    #c
}
blspcols <- c("#723281", "#6396BE", "#6EEAA8", "#FFFF63") 
vennpl_bl = plot(venn(list_core),
     fills = blspcols)

Tissue

Make venn diagram to show overlap top species for tissue samples.

# how many samples in each category
table(meta(ps_bact_tis)$species, useNA = "always")
# subset species
tissue_sub <- subset_samples(ps_bact_tis, species %in% c("Black rat", "Brush tail possum",  "Chuditch", "Long-nosed bandicoot"))
# transform data
tissue_subrel <- microbiome::transform(tissue_sub, "compositional")
# assign best hit
tissue_subrel.f <- format_to_besthit(tissue_subrel)
# Make a list of SampleCategory
species_cat <- unique(as.character(meta(tissue_subrel.f)$species))
print(species_cat)

list_core <- c() # an empty object to store information

for (n in species_cat){ # for each variable n in SampleCategory
    #print(paste0("Identifying Core Taxa for ", n))
    
    ps.sub <- subset_samples(tissue_subrel.f, species == n) # Choose sample from SampleCategory by n
    
    core_m <- core_members(ps.sub, # ps.sub is phyloseq selected with only samples from g 
                           detection = 0.001, # 0.001 in atleast 90% samples 
                           prevalence = 0.05)
    print(paste0("No. of core taxa in ", n, " : ", length(core_m))) # print core taxa identified in each SampleCategory
    list_core[[n]] <- core_m # add to a list core taxa for each group.

    #print(list_core)
}
tisspcols <- c("#6EEAA8", "#6396BE", "#723281", "#FFFF63") 
vennpl_tis = plot(venn(list_core),
     fills = tisspcols)

Merge into one figure

vennplot_bltis <- ggarrange(vennpl_bl, vennpl_tis,
                    labels = c("Blood", "Tissue"),
                    ncol = 2, nrow = 1)
ggsave("vennplot_bltis.pdf", plot = vennplot_bltis, path = "output/plots", width = 25, height = 10, units = "cm")

4.3. Compare `tick_species`

Make venn diagram to show overlap top species for blood samples.

# how many samples in each category
table(meta(ps_bact_tick)$tick_species, useNA = "always")
# subset species
tick_sub <- subset_samples(ps_bact_tick, tick_species %in% c("Amblyomma triguttatum", "Ixodes holocyclus", "Ixodes tasmani", "Ixodes trichosuri"))
# transform data
tick_subrel <- microbiome::transform(tick_sub, "compositional")
# assign best hit
tick_subrel.f <- format_to_besthit(tick_subrel)
# Make a list of SampleCategory
species_cat <- unique(as.character(meta(tick_subrel.f)$tick_species))
print(species_cat)

list_core <- c() # an empty object to store information

for (n in species_cat){ # for each variable n in SampleCategory
    #print(paste0("Identifying Core Taxa for ", n))
    
    ps.sub <- subset_samples(tick_subrel.f, tick_species == n) # Choose sample from SampleCategory by n
    
    core_m <- core_members(ps.sub, # ps.sub is phyloseq selected with only samples from g 
                           detection = 0.001, # 0.001 in atleast 90% samples 
                           prevalence = 0.05)
    print(paste0("No. of core taxa in ", n, " : ", length(core_m))) # print core taxa identified in each SampleCategory
    list_core[[n]] <- core_m # add to a list core taxa for each group.

    #c
}
tickspcolssub = c("#BB3754FF","#c5cc00", "#FCB519FF", "#D3D3D3")
#tickspcols <- c("#723281", "#6396BE", "#6EEAA8", "#FFFF63") 
vennpl_tick = plot(venn(list_core),
     fills = tickspcolssub)

Save figure

ggsave("vennpl_tick.pdf", plot = vennpl_tick, path = "output/plots", width = 15, height = 10, units = "cm")

5. Ordination

You will need to have a play around with the difference options: ordination method, data transformation and distance measure to see which one best explains the data. So I suggest try to keep good code here with lots of notes.

5.1. Ampvis2 ordination

Ordination plots by documentation. Plots PCA, RDA, CA, CCS, DCA, NMDS, PCOA, and MMDS. Note when performing NMDS do not transform data - i.e. transform = "none". For PCoA and MMDS need to include a distmeasure option. Valid options outlined here (note some require a phylogenetic tree). Using transform = “Hellinger” and distmeasure = “bray” and “jaccard” for PCOA and MMDS

All samples, and sort by sample type.

#Filter out samples with <1000 reads after QC
amp_samp1000 <- amp_subset_samples(amp_samp,
                                 minreads = 1000,
                                 RemoveAbsents = TRUE)
# CCA Canonical Correspondence Analysis (considered the constrained version of CA)
ord_CCA <- amp_ordinate(amp_samp1000,
                        type = "CCA",
                        constrain = "SampleCategory",
                        transform = "Hellinger",
                        sample_color_by = "SampleCategory",
                        sample_shape_by = "species",
                        sample_colorframe = TRUE,
                        sample_colorframe_label = "SampleCategory",
                        detailed_output = TRUE)
# CA Correspondence Analysis
ord_CA <- amp_ordinate(amp_samp1000,
                        type = "CA",
                        constrain = "SampleCategory",
                        transform = "Hellinger",
                        sample_color_by = "SampleCategory",
                        sample_shape_by = "species",
                        sample_colorframe = TRUE,
                        sample_colorframe_label = "SampleCategory",
                        detailed_output = TRUE)
# PCA Principal Components Analysis
ord_PCA <- amp_ordinate(amp_samp1000,
                        type = "PCA",
                        constrain = "SampleCategory",
                        transform = "Hellinger",
                        sample_color_by = "SampleCategory",
                        sample_shape_by = "species",
                        sample_colorframe = TRUE,
                        sample_colorframe_label = "SampleCategory",
                        detailed_output = TRUE)
# NMDS non-metric Multidimensional Scaling
ord_NMDS <- amp_ordinate(amp_samp1000,
                        type = "NMDS",
                        constrain = "SampleCategory",
                        transform = "Hellinger",
                        sample_color_by = "SampleCategory",
                        sample_shape_by = "species",
                        sample_colorframe = TRUE,
                        sample_colorframe_label = "SampleCategory",
                        detailed_output = TRUE)
# RDA  Redundancy Analysis (considered the constrained version of PCA)
ord_RDA <- amp_ordinate(amp_samp1000,
                        type = "RDA",
                        constrain = "SampleCategory",
                        transform = "Hellinger",
                        sample_color_by = "SampleCategory",
                        sample_shape_by = "species",
                        sample_colorframe = TRUE,
                        sample_colorframe_label = "SampleCategory",
                        detailed_output = TRUE)
# PCOA Principal Coordinates Analysis Jaccard
ord_PCOA_jac <- amp_ordinate(amp_samp1000,
                        type = "PCOA",
                        constrain = "SampleCategory",
                        distmeasure = "jaccard",
                        transform = "none",
                        sample_color_by = "SampleCategory",
                        sample_shape_by = "species",
                        sample_colorframe = TRUE,
                        sample_colorframe_label = "SampleCategory",
                        detailed_output = TRUE)
# PCOA Principal Coordinates Analysis Bray
ord_PCOA_bray <- amp_ordinate(amp_samp1000,
                        type = "PCOA",
                        constrain = "SampleCategory",
                        distmeasure = "bray",
                        transform = "none",
                        sample_color_by = "SampleCategory",
                        sample_shape_by = "species",
                        sample_colorframe = TRUE,
                        sample_colorframe_label = "SampleCategory",
                        detailed_output = TRUE)
# MMDS metric Multidimensional Scaling jaccard
ord_MMDS_jac <- amp_ordinate(amp_samp1000,
                        type = "MMDS",
                        constrain = "SampleCategory",
                        distmeasure = "jaccard",
                        transform = "none",
                        sample_color_by = "SampleCategory",
                        sample_shape_by = "species",
                        sample_colorframe = TRUE,
                        sample_colorframe_label = "SampleCategory",
                        detailed_output = TRUE)
# MMDS metric Multidimensional Scaling Bray
ord_MMDS_bray <- amp_ordinate(amp_samp1000,
                        type = "MMDS",
                        constrain = "SampleCategory",
                        distmeasure = "bray",
                        transform = "none",
                        sample_color_by = "SampleCategory",
                        sample_shape_by = "species",
                        sample_colorframe = TRUE,
                        sample_colorframe_label = "SampleCategory",
                        detailed_output = TRUE)

Customise and save plots

ord_CCA_plot = ord_CCA$plot + scale_shape_manual(values=seq(0,10)) + scale_colour_manual(values = c("#7a255d", "#9fd0cb", "#7566ff"))
ggsave("ord_CCA.pdf", plot = ord_CCA_plot, path = "output/plots", width = 15, height = 15, units = "cm")
ord_CA_plot <- ord_CA$plot + scale_shape_manual(values=seq(0,10)) + scale_colour_manual(values = c("#7a255d", "#9fd0cb", "#7566ff"))
ggsave("ord_CA.pdf", plot = ord_CA_plot, path = "output/plots", width = 15, height = 15, units = "cm")
ord_RDA_plot = ord_RDA$plot + scale_shape_manual(values=seq(0,10)) + scale_colour_manual(values = c("#7a255d", "#9fd0cb", "#7566ff"))
ggsave("ord_RDA.pdf", plot = ord_RDA_plot, path = "output/plots", width = 15, height = 15, units = "cm")
ord_PCA_plot <- ord_PCA$plot + scale_shape_manual(values=seq(0,10)) + scale_colour_manual(values = c("#7a255d", "#9fd0cb", "#7566ff"))
ggsave("ord_PCA.pdf", plot = ord_PCA_plot, path = "output/plots", width = 15, height = 15, units = "cm")
ord_NMDS_plot <- ord_NMDS$plot + scale_shape_manual(values=seq(0,10)) + scale_colour_manual(values = c("#7a255d", "#9fd0cb", "#7566ff"))
ggsave("ord_NMDS.pdf", plot = ord_NMDS_plot, path = "output/plots", width = 15, height = 15, units = "cm")
ord_PCOA_bray_plot <- ord_PCOA_bray$plot + scale_shape_manual(values=seq(0,10)) + scale_colour_manual(values = c("#7a255d", "#9fd0cb", "#7566ff"))
ggsave("ord_PCOA_bray.pdf", plot = ord_PCOA_bray_plot, path = "output/plots", width = 15, height = 15, units = "cm")
ord_PCOA_jac_plot <- ord_PCOA_jac$plot + scale_shape_manual(values=seq(0,10)) + scale_colour_manual(values = c("#7a255d", "#9fd0cb", "#7566ff"))
ggsave("ord_PCOA_jac.pdf", plot = ord_PCOA_jac_plot, path = "output/plots", width = 15, height = 15, units = "cm")
ord_MMDS_bray_plot <- ord_MMDS_bray$plot + scale_shape_manual(values=seq(0,10)) + scale_colour_manual(values = c("#7a255d", "#9fd0cb", "#7566ff"))
ggsave("ord_MMDS_bray.pdf", plot = ord_MMDS_bray_plot, path = "output/plots", width = 15, height = 15, units = "cm")
ord_MMDS_jac_plot <- ord_MMDS_jac$plot + scale_shape_manual(values=seq(0,10)) + scale_colour_manual(values = c("#7a255d", "#9fd0cb", "#7566ff"))
ggsave("ord_MMDS_jac.pdf", plot = ord_MMDS_jac_plot, path = "output/plots", width = 15, height = 15, units = "cm")
# Merge the two constrained analysis methods into one figrue
ordination_CCA_RDA <- ggarrange(ord_CCA_plot, ord_RDA_plot,
                    labels = c("A", "B"),
                    ncol = 2, nrow = 1)
ggsave("ordination_CCA_RDA.pdf", plot = ordination_CCA_RDA, path = "output/plots", width = 30, height = 10, units = "cm")

Repeat the constrained analysis on subsetted host data

#Subset black rat samples and filter out samples with <1000 reads after QC
amp_BR1000 <- amp_subset_samples(amp_samp,
                                 species %in% c("Black rat"),
                                 minreads = 1000,
                                 RemoveAbsents = TRUE)
# Canonical Correspondence Analysis (considered the constrained version of CA)
BRord_CCA <- amp_ordinate(amp_BR1000,
                        type = "CCA",
                        constrain = "SampleType",
                        transform = "Hellinger",
                        sample_color_by = "SampleType",
                        sample_colorframe = TRUE,
                        sample_colorframe_label = "SampleType",
                        detailed_output = TRUE)
# RDA  Redundancy Analysis (considered the constrained version of PCA)
BRord_RDA <- amp_ordinate(amp_BR1000,
                        type = "RDA",
                        constrain = "SampleCategory",
                        transform = "Hellinger",
                        sample_color_by = "SampleCategory",
                        sample_colorframe = TRUE,
                        sample_colorframe_label = "SampleCategory",
                        detailed_output = TRUE)

Constrained ordination of tick species

# Use tick dataset and filter out samples with <1000 reads after QC
amp_tick1000 <- amp_subset_samples(amp_tick,
                                 minreads = 1000,
                                 RemoveAbsents = TRUE)
# Canonical Correspondence Analysis (considered the constrained version of CA)
tickord_CCA <- amp_ordinate(amp_tick1000,
                        type = "CCA",
                        constrain = "tick_sp_short",
                        transform = "Hellinger",
                        sample_color_by = "tick_sp_short",
                        sample_colorframe = TRUE,
                        sample_colorframe_label = "tick_sp_short",
                        detailed_output = TRUE)
# RDA  Redundancy Analysis (considered the constrained version of PCA)
tickord_RDA <- amp_ordinate(amp_tick1000,
                        type = "RDA",
                        constrain = "tick_sp_short",
                        transform = "Hellinger",
                        sample_color_by = "tick_sp_short",
                        sample_colorframe = TRUE,
                        sample_colorframe_label = "tick_sp_short",
                        detailed_output = TRUE)

Customise plots and combine to one figure

# Use viridis to get colours
# tickspcols = viridis(7, option = "B")

# optimise colours
tickspcols = c("#000004FF", "#330A5FFF", "#781C6DFF", "#BB3754FF", "#ED6925FF", "#FCB519FF", "#c5cc00")
tickord_CCApl = tickord_CCA$plot + scale_colour_manual(values = tickspcols) + theme(legend.position="none")
tickord_RDApl = tickord_RDA$plot + scale_colour_manual(values = tickspcols) + theme(legend.position="none")
# Merge plots and save
tick_ord <- ggarrange(tickord_CCApl, tickord_RDApl,
                    labels = c("A", "B"),
                    ncol = 2, nrow = 1)
ggsave("tick_ord.pdf", plot = tick_ord, path = "output/plots", width = 25, height = 15, units = "cm")

Zoom in on tick species for CCA ordination

# Use tick dataset and filter out samples with <1000 reads after QC
amp_ticksub1000 <- amp_subset_samples(amp_tick,
                                   tick_sp_short %in% c("Ixtas", "Ixhol", "Ixtri", "Ixhol;Ixtri"),
                                   minreads = 1000,
                                   RemoveAbsents = TRUE)
# Canonical Correspondence Analysis (considered the constrained version of CA)
tickordsub_CCA <- amp_ordinate(amp_ticksub1000,
                        type = "CCA",
                        constrain = "tick_sp_short",
                        transform = "Hellinger",
                        sample_color_by = "tick_sp_short",
                        sample_colorframe = TRUE,
                        sample_colorframe_label = "tick_sp_short",
                        detailed_output = TRUE)
tickordsub_RDA <- amp_ordinate(amp_ticksub1000,
                        type = "RDA",
                        constrain = "tick_sp_short",
                        transform = "Hellinger",
                        sample_color_by = "tick_sp_short",
                        sample_colorframe = TRUE,
                        sample_colorframe_label = "tick_sp_short",
                        detailed_output = TRUE)

Customise plots and combine to one figure

# Use viridis to get colours
# tickspcols = viridis(7, option = "B")

# optimise colours
tickspcols2 = c("#BB3754FF", "#ED6925FF", "#FCB519FF", "#c5cc00")
tickordsub_CCApl = tickordsub_CCA$plot + scale_colour_manual(values = tickspcols2) + theme(legend.position="none")
tickordsub_RDApl = tickordsub_RDA$plot + scale_colour_manual(values = tickspcols2) + theme(legend.position="none")
# Merge plots and save
ticksub_ord <- ggarrange(tickordsub_CCApl, tickordsub_RDApl,
                    labels = c("A", "B"),
                    ncol = 2, nrow = 1)
ggsave("ticksub_ord.pdf", plot = ticksub_ord, path = "output/plots", width = 25, height = 15, units = "cm")

5.2. Phyloseq

Ordination using the phyloseq package

#prune samples with low reads
ps_BR1 = prune_samples(sample_sums(ps_BR)>=1000, ps_BR)
bray_dist = phyloseq::distance(ps_BR1, method="bray", weighted=F)
BR_bray_PCoAordination = ordinate(ps_BR1, method="PCoA", distance=bray_dist)
BR_bray_PCoA <- plot_ordination(ps_BR1, BR_bray_PCoAordination, color="SampleCategory") + theme(aspect.ratio=1)
adonis(bray_dist ~ sample_data(ps_BR1)$SampleCategory)

#prune samples with low reads
ps_BR1 = prune_samples(sample_sums(ps_BR)>=1000, ps_BR)
eu_dist = phyloseq::distance(ps_BR1, method="euclidean", weighted=F)
BR_eu_PCoAordination = ordinate(ps_BR1, method="PCoA", distance=eu_dist)
BR_eu_PCoA <- plot_ordination(ps_BR1, BR_eu_PCoAordination, color="SampleCategory") + theme(aspect.ratio=1)
# Statistical analysis
adonis(eu_dist ~ sample_data(ps_BR1)$SampleCategory)

phyloseq ordination - tick samples

#prune samples with low reads
ps_tick = prune_samples(sample_sums(ps_bact_tick)>=1000, ps_bact_tick)
bray_dist = phyloseq::distance(ps_tick, method="bray", weighted=F)
tick_PCoAordination = ordinate(ps_tick, method="PCoA", distance=bray_dist)
tick_PCoAordination2 = plot_ordination(ps_tick, tick_PCoAordination, color="tick_sp_short", shape="instar") + theme(aspect.ratio=1)
adonis(bray_dist ~ sample_data(ps_tick)$species)

phyloseq ordination - Ix. holocyclus samples

ixhol = subset_samples(ps_bact_tick, tick_sp_short=="Ixhol")
#prune samples with low reads
ps_tick = prune_samples(sample_sums(ixhol)>=1000, ixhol)
bray_dist = phyloseq::distance(ps_tick, method="bray", weighted=F)
tick_PCoAordination = ordinate(ps_tick, method="PCoA", distance=bray_dist)
tick_PCoAordination2 = plot_ordination(ps_tick, tick_PCoAordination, color="tick_sp_short", shape="instar") + theme(aspect.ratio=1)
adonis(bray_dist ~ sample_data(ps_tick)$species)

phyloseq ordination - Ix. holocyclus, Ix. tasmani and Ix. trichosuri samples

ps_bact_tick_sub = subset_samples(ps_bact_tick, tick_sp_short=="Ixhol" | tick_sp_short=="Ixtri" | tick_sp_short=="Ixtas" | tick_sp_short=="Ixhol;Ixtri")
# prune samples with less than 1000 reads
ps_bact_tick_sub_pr = prune_samples(sample_sums(ps_bact_tick_sub)>=1000, ps_bact_tick_sub)

bray_dist = phyloseq::distance(ps_bact_tick_sub_pr, method="bray", weighted=F)
tick_PCoAordination = ordinate(ps_bact_tick_sub_pr, method="PCoA", distance=bray_dist)
tick_PCoAordination2 = plot_ordination(ps_bact_tick_sub_pr, tick_PCoAordination, color="tick_sp_short", shape="instar") + theme(aspect.ratio=1)
adonis(bray_dist ~ sample_data(ps_bact_tick_sub_pr)$tick_sp_short)

5.3. Statistical analysis

See tutorial here

Overall statistical analysis Differences by sample type - using both ANOVA and MANOVA methods

# prune samples with less than 1000 reads
ps_bact_pr = prune_samples(sample_sums(ps_bact_samp)>=1000, ps_bact_samp)
# Transform data to hellinger
pseq.rel <- microbiome::transform(ps_bact_pr, "hellinger")
# Pick relative abundances (compositional) and sample metadata 
otu <- abundances(pseq.rel)
meta <- meta(pseq.rel)

# samples x SampleCategory as input
permanova <- adonis(t(otu) ~ SampleCategory,
                    data = meta, permutations=999, method = "bray")


## statistics
print(as.data.frame(permanova$aov.tab)["SampleCategory", "Pr(>F)"])
dist <- vegdist(t(otu))
mod <- betadisper(dist, meta$SampleCategory)
### ANOVA - are groups different 
anova(betadisper(dist, meta$SampleCategory))
### PERMANOVA - which groups different
permutest(betadisper(dist, meta$SampleCategory), pairwise = TRUE)

Statistical comparison within sample types statistical analysis Differences by sample type - using both ANOVA and MANOVA methods

Blood samples - all

# prune samples with less than 1000 reads
ps_bact_bl_pr = prune_samples(sample_sums(ps_bact_bl)>=1000, ps_bact_bl)
# Transform data to hellinger
pseq.rel <- microbiome::transform(ps_bact_bl_pr, "hellinger")
# Pick relative abundances (compositional) and sample metadata 
otu <- abundances(pseq.rel)
meta <- meta(pseq.rel)

# samples x species as input
permanova <- adonis(t(otu) ~ species,
                    data = meta, permutations=999, method = "bray")


## statistics
print(as.data.frame(permanova$aov.tab)["species", "Pr(>F)"])
dist <- vegdist(t(otu))
mod <- betadisper(dist, meta$species)
### ANOVA - are groups different 
anova(betadisper(dist, meta$species))
### PERMANOVA - which groups different
permutest(betadisper(dist, meta$species), pairwise = TRUE)

Tick samples - all

# prune samples with less than 1000 reads
ps_bact_tick_pr = prune_samples(sample_sums(ps_bact_tick)>=1000, ps_bact_tick)
# Transform data to hellinger
pseq.rel <- microbiome::transform(ps_bact_tick_pr, "hellinger")
# Pick relative abundances (compositional) and sample metadata 
otu <- abundances(pseq.rel)
meta <- meta(pseq.rel)

# samples x species as input
permanova <- adonis(t(otu) ~ species,
                    data = meta, permutations=999, method = "bray")


## statistics
print(as.data.frame(permanova$aov.tab)["species", "Pr(>F)"])
dist <- vegdist(t(otu))
mod <- betadisper(dist, meta$species)
### ANOVA - are groups different 
anova(betadisper(dist, meta$species))
### PERMANOVA - which groups different
permutest(betadisper(dist, meta$species), pairwise = TRUE)

Tissue samples - all

# prune samples with less than 1000 reads
ps_bact_tis_pr = prune_samples(sample_sums(ps_bact_tis)>=1000, ps_bact_tis)
# Transform data to hellinger
pseq.rel <- microbiome::transform(ps_bact_tis_pr, "hellinger")
# Pick relative abundances (compositional) and sample metadata 
otu <- abundances(pseq.rel)
meta <- meta(pseq.rel)

# samples x species as input
permanova <- adonis(t(otu) ~ species,
                    data = meta, permutations=999, method = "bray")


## statistics
print(as.data.frame(permanova$aov.tab)["species", "Pr(>F)"])
dist <- vegdist(t(otu))
mod <- betadisper(dist, meta$species)
### ANOVA - are groups different 
anova(betadisper(dist, meta$species))
### PERMANOVA - which groups different
permutest(betadisper(dist, meta$species), pairwise = TRUE)

Statistical comparison between tick species Differences by sample type - using both ANOVA and MANOVA methods

Tick samples - all

# prune samples with less than 1000 reads
ps_bact_tick_pr = prune_samples(sample_sums(ps_bact_tick)>=1000, ps_bact_tick)
# Transform data to hellinger
pseq.rel <- microbiome::transform(ps_bact_tick_pr, "hellinger")
# Pick relative abundances (compositional) and sample metadata 
otu <- abundances(pseq.rel)
meta <- meta(pseq.rel)

# samples x tick species as input
permanova <- adonis(t(otu) ~ tick_sp_short,
                    data = meta, permutations=999, method = "bray")


## statistics
print(as.data.frame(permanova$aov.tab)["tick_sp_short", "Pr(>F)"])
dist <- vegdist(t(otu))
mod <- betadisper(dist, meta$tick_sp_short)
### ANOVA - are groups different 
anova(betadisper(dist, meta$tick_sp_short))
### PERMANOVA - which groups different
permutest(betadisper(dist, meta$tick_sp_short), pairwise = TRUE)

Tick samples - subset of just sympatric Northern Beaches samples

# subset select tick species
ps_bact_tick_sub = subset_samples(ps_bact_tick, tick_sp_short=="Ixhol" | tick_sp_short=="Ixtri" | tick_sp_short=="Ixtas" | tick_sp_short=="Ixhol;Ixtri")
# prune samples with less than 1000 reads
ps_bact_tick_sub_pr = prune_samples(sample_sums(ps_bact_tick_sub)>=1000, ps_bact_tick_sub)
# Transform data to hellinger
pseq.rel <- microbiome::transform(ps_bact_tick_sub_pr, "hellinger")
# Pick relative abundances (compositional) and sample metadata 
otu <- abundances(pseq.rel)
meta <- meta(pseq.rel)

# samples x tick species as input
permanova <- adonis(t(otu) ~ tick_sp_short,
                    data = meta, permutations=999, method = "bray")


## statistics
print(as.data.frame(permanova$aov.tab)["tick_sp_short", "Pr(>F)"])
dist <- vegdist(t(otu))
mod <- betadisper(dist, meta$tick_sp_short)
### ANOVA - are groups different 
anova(betadisper(dist, meta$tick_sp_short))
### PERMANOVA - which groups different
permutest(betadisper(dist, meta$tick_sp_short), pairwise = TRUE)

6. Beta-diversity plot

Documentation here

Create matrix of beta diversity between samples.

betadiv <- amp_betadiv(amp_subset,
  distmeasure = "bray",
  label_by = "Instar",
  show_values = FALSE,
  values_size = 3
)
betadiv

7. Hierarchical cluster analysis

Function from MicrobiotaProcess.

Beta diversity metrics can assess the differences between microbial communities. It can be visualized with PCA or PCoA, this can also be visualized with hierarchical clustering. MicrobiotaProcess also implements the analysis based on ggtree.

## All samples - detailed, include species and SampleCategory
clust_all <- get_clust(obj=ps_bact_samp, distmethod="euclidean",
                      method="hellinger", hclustmethod="average")
speciescol = viridis(10, option = "D")
# circular layout
clust_all_plot <- ggclust(obj=clust_all ,
                      layout = "circular",
                      pointsize=1,
                      fontsize=0,
                      factorNames=c("species","SampleCategory")) + scale_color_manual(values=speciescol) +  scale_shape_manual(values=c(17, 15, 16)) + ggtitle("Hierarchical Cluster of All Samples (euclidean)")
# Save plot
ggsave("clust_all_plot.pdf", plot = clust_all_plot , path = "output/plots", width = 25, height = 25, units = "cm")

### All samples - simple version, only colour SampleCategory
ColSampCat = c("#7a255d", "#9fd0cb", "#7566ff")
# circular layout
clust_all_plot2 <- ggclust(obj=clust_all ,
                      layout = "circular",
                      pointsize=1,
                      fontsize=0,
                      factorNames=c("SampleCategory")) + scale_color_manual(values=ColSampCat) +  ggtitle("Hierarchical Cluster of All Samples (euclidean)")
# Save plot
ggsave("clust_all_plot2.pdf", plot = clust_all_plot2 , path = "output/plots", width = 25, height = 25, units = "cm")


# Blood samples
clust_bl <- get_clust(obj=ps_bact_bl, distmethod="euclidean",
                      method="hellinger", hclustmethod="average")
speciesblcol = viridis(7, option = "D")
# circular layout
clust_bl_plot <- ggclust(obj=clust_bl ,
                      layout = "circular",
                      pointsize=3,
                      fontsize=0,
                      factorNames=c("species")) + scale_color_manual(values=speciesblcol) + ggtitle("Hierarchical Cluster of Blood Samples (euclidean)")
# Save plot
ggsave("clust_bl_plot.pdf", plot = clust_bl_plot , path = "output/plots", width = 25, height = 25, units = "cm")

## Tick samples
clust_tick <- get_clust(obj=ps_bact_tick, distmethod="euclidean",
                      method="hellinger", hclustmethod="average")
speciestickcol = viridis(8, option = "D")
# circular layout
clust_tick_plot <- ggclust(obj=clust_tick ,
                      layout = "circular",
                      pointsize=2,
                      fontsize=0,
                      factorNames=c("species","tick_sp_short")) + scale_color_manual(values=speciestickcol) +  scale_shape_manual(values=c(17, 8, 25, 18, 7, 15, 16)) + ggtitle("Hierarchical Cluster of Tick Samples (euclidean)")
# Save plot
ggsave("clust_tick_plot.pdf", plot = clust_tick_plot , path = "output/plots", width = 25, height = 25, units = "cm")

### Tissue samples
clust_tis <- get_clust(obj=ps_bact_tis, distmethod="euclidean",
                      method="hellinger", hclustmethod="average")
speciestiscol = viridis(9, option = "D")
## circular layout
clust_tis_plot <- ggclust(obj=clust_tis ,
                      layout = "circular",
                      pointsize=3,
                      fontsize=0,
                      factorNames=c("species")) + scale_color_manual(values=speciestiscol) + ggtitle("Hierarchical Cluster of Tissue Samples (euclidean)")
# Save plot
ggsave("clust_tis_plot.pdf", plot = clust_tis_plot , path = "output/plots", width = 25, height = 25, units = "cm")

Tree for taxa of interest

Load additional libraries

library(ggridges)
library(scales)
library(ggtree)

Simple tree from tutorial here

# group by genus
taxa_of_interest_g = tax_glom(taxa_of_interest, "Genus")
# plot tree
tree1 = plot_tree(taxa_of_interest_g, color="species", shape ="SampleCategory", size="abundance")

From tutorial here

tree2 = ggtree(taxa_of_interest, layout='circular') + geom_text2(aes(subset=!isTip, label=label), hjust=-.2, size=4) +
  geom_tiplab(aes(label=Genus), hjust=-.3) +
  geom_point(aes(x=x+hjust, color=species, shape=Family),na.rm=TRUE) +
  scale_size_continuous(trans=log_trans(5)) +
  theme(legend.position="right") + scale_shape_manual(values=c(17, 15, 16, 0, 1, 2, 10))

library(ggtreeExtra)
library(ggtree)
library(phyloseq)
library(dplyr)

data("GlobalPatterns")
GP <- GlobalPatterns
GP <- prune_taxa(taxa_sums(GP) > 600, GP)
sample_data(GP)$human <- get_variable(GP, "SampleType") %in%
                              c("Feces", "Skin")
mergedGP <- merge_samples(GP, "SampleType")
mergedGP <- rarefy_even_depth(taxa_of_interest,rngseed=394582)
mergedGP <- tax_glom(mergedGP,"Order")

melt_simple <- psmelt(taxa_of_interest) %>%
               filter(Abundance < 120) %>%
               select(OTU, val=Abundance)

p <- ggtree(taxa_of_interest, layout="fan", open.angle=10) + 
     geom_tippoint(mapping=aes(color=Family), 
                   size=1.5,
                   show.legend=FALSE)
p <- rotate_tree(p, -90)

p <- tree2 +
     geom_fruit(
         data=melt_simple,
         geom=geom_boxplot,
         mapping = aes(
                     y=OTU,
                     x=val,
                     group=label,
                     fill=Family,
                   ),
         size=.2,
         outlier.size=0.5,
         outlier.stroke=0.08,
         outlier.shape=21,
         axis.params=list(
                         axis       = "x",
                         text.size  = 1.8,
                         hjust      = 1,
                         vjust      = 0.5,
                         nbreak     = 3,
                     ),
         grid.params=list()
     ) 
     
p <- p +
     scale_fill_discrete(
         name="Family",
         guide=guide_legend(keywidth=0.8, keyheight=0.8, ncol=1)
     ) +
     theme(
         legend.title=element_text(size=9), # The title of legend 
         legend.text=element_text(size=7) # The label text of legend, the sizes should be adjust with dpi.
     )
p

sessionInfo()

R version 4.0.3 (2020-10-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Mojave 10.14.6

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] MicrobiotaProcess_1.2.2     eulerr_6.1.0               
 [3] Biostrings_2.58.0           XVector_0.30.0             
 [5] patchwork_1.1.1             knitr_1.31                 
 [7] microbiomeutilities_1.00.12 vegan_2.5-7                
 [9] lattice_0.20-41             permute_0.9-5              
[11] DESeq2_1.30.1               SummarizedExperiment_1.20.0
[13] Biobase_2.50.0              MatrixGenerics_1.2.1       
[15] matrixStats_0.58.0          GenomicRanges_1.42.0       
[17] GenomeInfoDb_1.26.2         IRanges_2.24.1             
[19] S4Vectors_0.28.1            BiocGenerics_0.36.0        
[21] ape_5.4-1                   data.table_1.14.0          
[23] decontam_1.10.0             reshape_0.8.8              
[25] microbiome_1.13.9           MicrobeR_0.3.2             
[27] cowplot_1.1.1               viridis_0.5.1              
[29] viridisLite_0.3.0           plotly_4.9.3               
[31] agricolae_1.3-3             ggpubr_0.4.0               
[33] ampvis2extras_0.1.5         ampvis2_2.6.7              
[35] forcats_0.5.1               stringr_1.4.0              
[37] dplyr_1.0.5                 purrr_0.3.4                
[39] readr_1.4.0                 tidyr_1.1.2                
[41] tibble_3.1.0                ggplot2_3.3.3              
[43] tidyverse_1.3.0             phyloseq_1.34.0            
[45] qiime2R_0.99.4              workflowr_1.6.2            

loaded via a namespace (and not attached):
  [1] bit64_4.0.5            multcomp_1.4-16        DelayedArray_0.16.2   
  [4] rpart_4.1-15           RCurl_1.98-1.2         doParallel_1.0.16     
  [7] generics_0.1.0         TH.data_1.0-10         RSQLite_2.2.3         
 [10] combinat_0.0-8         bit_4.0.4              xml2_1.3.2            
 [13] lubridate_1.7.10       httpuv_1.5.5           assertthat_0.2.1      
 [16] xfun_0.21              hms_1.0.0              jquerylib_0.1.3       
 [19] evaluate_0.14          promises_1.2.0.1       Rmisc_1.5             
 [22] fansi_0.4.2            progress_1.2.2         dbplyr_2.1.0          
 [25] readxl_1.3.1           igraph_1.2.6           DBI_1.1.1             
 [28] geneplotter_1.68.0     htmlwidgets_1.5.3      ellipsis_0.3.1        
 [31] backports_1.2.1        picante_1.8.2          annotate_1.68.0       
 [34] libcoin_1.0-8          vctrs_0.3.7            abind_1.4-5           
 [37] cachem_1.0.4           withr_2.4.1            checkmate_2.0.0       
 [40] treeio_1.14.3          prettyunits_1.1.1      cluster_2.1.1         
 [43] lazyeval_0.2.2         crayon_1.4.1           genefilter_1.72.1     
 [46] pkgconfig_2.0.3        zCompositions_1.3.4    nlme_3.1-152          
 [49] nnet_7.3-15            rlang_0.4.10           questionr_0.7.4       
 [52] lifecycle_1.0.0        miniUI_0.1.1.1         sandwich_3.0-0        
 [55] modelr_0.1.8           cellranger_1.1.0       rprojroot_2.0.2       
 [58] Matrix_1.3-2           aplot_0.0.6            phangorn_2.5.5        
 [61] carData_3.0-4          zoo_1.8-8              Rhdf5lib_1.12.1       
 [64] reprex_1.0.0           base64enc_0.1-3        pheatmap_1.0.12       
 [67] whisker_0.4            png_0.1-7              bitops_1.0-6          
 [70] rhdf5filters_1.2.0     blob_1.2.1             coin_1.4-1            
 [73] jpeg_0.1-8.1           rstatix_0.7.0          DECIPHER_2.18.1       
 [76] ggsignif_0.6.1         rtk_0.2.6.1            klaR_0.6-15           
 [79] scales_1.1.1           memoise_2.0.0          magrittr_2.0.1        
 [82] plyr_1.8.6             gghalves_0.1.1         zlibbioc_1.36.0       
 [85] compiler_4.0.3         philr_1.16.0           RColorBrewer_1.1-2    
 [88] ggstar_1.0.1           cli_2.3.1              ade4_1.7-16           
 [91] htmlTable_2.1.0        Formula_1.2-4          MASS_7.3-53.1         
 [94] mgcv_1.8-34            tidyselect_1.1.0       stringi_1.5.3         
 [97] highr_0.8              yaml_2.2.1             locfit_1.5-9.4        
[100] latticeExtra_0.6-29    ggrepel_0.9.1          grid_4.0.3            
[103] sass_0.3.1             fastmatch_1.1-0        tools_4.0.3           
[106] rio_0.5.26             rstudioapi_0.13        foreach_1.5.1         
[109] foreign_0.8-81         git2r_0.28.0           gridExtra_2.3         
[112] Rtsne_0.15             digest_0.6.27          rvcheck_0.1.8         
[115] BiocManager_1.30.10    shiny_1.6.0            quadprog_1.5-8        
[118] Rcpp_1.0.6             car_3.0-10             broom_0.7.5           
[121] later_1.1.0.1          httr_1.4.2             AnnotationDbi_1.52.0  
[124] colorspace_2.0-0       rvest_0.3.6            XML_3.99-0.5          
[127] fs_1.5.0               truncnorm_1.0-8        splines_4.0.3         
[130] tidytree_0.3.3         multtest_2.46.0        xtable_1.8-4          
[133] jsonlite_1.7.2         ggtree_2.4.1           AlgDesign_1.2.0       
[136] modeltools_0.2-23      R6_2.5.0               Hmisc_4.5-0           
[139] NADA_1.6-1.1           pillar_1.5.1           htmltools_0.5.1.1     
[142] mime_0.10              glue_1.4.2             fastmap_1.1.0         
[145] DT_0.17                BiocParallel_1.24.1    ggnet_0.1.0           
[148] codetools_0.2-18       mvtnorm_1.1-1          utf8_1.2.1            
[151] bslib_0.2.4.9001       network_1.16.1         curl_4.3              
[154] gtools_3.8.2           zip_2.1.1              openxlsx_4.2.3        
[157] survival_3.2-7         rmarkdown_2.7.2        biomformat_1.18.0     
[160] munsell_0.5.0          rhdf5_2.34.0           GenomeInfoDbData_1.2.4
[163] iterators_1.0.13       labelled_2.7.0         haven_2.3.1           
[166] reshape2_1.4.4         gtable_0.3.0

Project website by Siobhon L. Egan, 2021. This site was created in R Markdown with workflowr