Last updated: 2021-07-03
Checks: 7 0
Knit directory: wildlife-bacteria/
This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20210129)
was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version af6ad5b. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .DS_Store
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: analysis/.DS_Store
Ignored: analysis/_footer.html
Ignored: data/.DS_Store
Ignored: output/.DS_Store
Ignored: output/plots/.DS_Store
Ignored: output/plots/QC/.DS_Store
Ignored: output/plots/boxplots_select_taxa/.DS_Store
Ignored: output/plots/heatmaps/.DS_Store
Ignored: output/plots/maps/.DS_Store
Ignored: output/plots/tax_prev_abund/.DS_Store
Untracked files:
Untracked: ENA_docs/
Untracked: NCBI_data/
Untracked: data/dada2/
Untracked: data/dada2_tois/
Untracked: data/taxa_trees/
Untracked: data/tmp/
Untracked: output/beta-div-statistics.txt
Untracked: output/supp_table_pos.xlsx
Untracked: tmp/
Unstaged changes:
Modified: RData/amp_dec_bact.RData
Modified: RData/amp_raw_bact.RData
Modified: RData/ps_dec_bact.RData
Modified: RData/ps_raw_bact.RData
Modified: README.md
Deleted: analysis/site-map.Rmd
Deleted: analysis/tois.Rmd
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were made to the R Markdown (analysis/phyloseq.Rmd
) and HTML (docs/phyloseq.html
) files. If you’ve configured a remote Git repository (see ?wflow_git_remote
), click on the hyperlinks in the table below to view the files as they were in that past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | af6ad5b | siobhon-egan | 2021-07-03 | general update |
html | e926c9d | siobhon-egan | 2021-07-02 | fix conflicts |
Rmd | 33466dd | siobhon-egan | 2021-07-01 | update inc accession no |
html | 33466dd | siobhon-egan | 2021-07-01 | update inc accession no |
html | 5f4c86d | siobhon-egan | 2021-05-23 | restructure pages |
Rmd | 0d602a3 | siobhon-egan | 2021-05-23 | updates to pages |
html | 0d602a3 | siobhon-egan | 2021-05-23 | updates to pages |
html | a69dea3 | siobhon-egan | 2021-04-24 | Build site. |
Rmd | 6ebcc5d | siobhon-egan | 2021-04-24 | updates |
html | 6ebcc5d | siobhon-egan | 2021-04-24 | updates |
html | 5486f7e | siobhon-egan | 2021-02-26 | Build site. |
Rmd | ebe20d0 | siobhon-egan | 2021-02-26 | add qiime and phyloseq pages |
Visualization and analysis of microbiome data.
Note this was run using R version 4.0.3 and RStudo version 1.4. See R info for full details of R session.
See Visualization page for details on data visualization.
Install libraries if required
Only need to run this code once.
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
# phyloseq
source('http://bioconductor.org/biocLite.R')
biocLite('phyloseq')
#tidyverse
install. packages("tidyverse")
#ampvis2
install.packages("remotes")
remotes::install_github("MadsAlbertsen/ampvis2")
#ampvis2extras
install.packages("BiocManager")
BiocManager::install("kasperskytte/ampvis2extras")
#ggpubr
install.packages("ggpubr")
#agricolae
install.packages("agricolae")
install.packages("remotes")
remotes::install_github("DanielSprockett/reltools")
devtools::install_github('jsilve24/philr')
#decontam
BiocManager::install("decontam")
library(decontam)
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("Biostrings")
Load libraries
pkgs <- c("qiime2R", "phyloseq", "tidyverse", "ampvis2", "ampvis2extras",
"ggpubr", "agricolae", "plotly", "viridis", "cowplot", "MicrobeR",
"microbiome", "reshape", "decontam", "data.table", "ape", "DESeq2",
"vegan", "microbiomeutilities", "knitr", "tibble", "dplyr",
"patchwork", "Biostrings")
lapply(pkgs, require, character.only = TRUE)
Generate phyloseq object from spreadsheets.
Import ASV/OTU count data
dada2_ASVs <- read_tsv("data/dada2/phyloseq/feature-table.tsv", skip = 1)
dada2_ASVs_lab = column_to_rownames(dada2_ASVs, var = "#OTU ID")
Import taxonomy data
taxonomy <- read_csv("data/dada2/phyloseq/taxonomy.csv")
taxonomy_lab = column_to_rownames(taxonomy, var = "#q2:types")
Add ASV sequences by reading in the .fasta
file.
# read sequence file
rep.seqs <- Biostrings::readDNAStringSet("data/dada2/phyloseq/all_rep-seqs.fasta", format = "fasta")
Add metadata, importing gDNAID, SampleType and SampleCategory as factor to be able to merge later on and perform statistical analysis
library(readr)
metadata <- read_csv("data/dada2/phyloseq/metadata.csv",
col_types = cols(
SampleType = col_factor(levels =c("categorical","Sample", "Control", "EcolSample")),
SampleCategory = col_factor(levels = c("categorical", "Blood", "Tissue", "Tick")),
gDNAID = col_factor(levels = c("categorical","Bl_1","Bl_10","Bl_102","Bl_113","Bl_114","Bl_115","Bl_116","Bl_117","Bl_118","Bl_119","Bl_120","Bl_121","Bl_122","Bl_123","Bl_124","Bl_125","Bl_126","Bl_127","Bl_128","Bl_129","Bl_130","Bl_131","Bl_132","Bl_133","Bl_134","Bl_135","Bl_136","Bl_137","Bl_138","Bl_139","Bl_140","Bl_141","Bl_142","Bl_143","Bl_144","Bl_145","Bl_146","Bl_147","Bl_149","Bl_150","Bl_151","Bl_152","Bl_153","Bl_154","Bl_155","Bl_156","Bl_157","Bl_158","Bl_159","Bl_160","Bl_161","Bl_162","Bl_163","Bl_164","Bl_18","Bl_19","Bl_25","Bl_27","Bl_28","Bl_31","Bl_37","Bl_43","Bl_49","Bl_52","Bl_54","Bl_55","Bl_6","Bl_61","Bl_63","Bl_66","Bl_7","Bl_73","Bl_75","Bl_78","Bl_79","Bl_80","Bl_82","Bl_86","Bl_94","Bl_98","Bl_NTC1","Bl_NTC2","Bl_NTC3","Bl_NTC4","Bl_NTC45","Bl_NTC46","Bl_NTC47","Bl_100","Bl_101","Bl_103","Bl_104","Bl_105","Bl_106","Bl_108","Bl_109","Bl_11","Bl_110","Bl_111","Bl_112","Bl_12","Bl_13","Bl_14","Bl_15","Bl_16","Bl_17","Bl_2","Bl_20","Bl_21","Bl_22","Bl_23","Bl_24","Bl_26","Bl_29","Bl_3","Bl_30","Bl_32","Bl_33","Bl_34","Bl_35","Bl_36","Bl_38","Bl_39","Bl_4","Bl_40","Bl_41","Bl_42","Bl_44","Bl_45","Bl_46","Bl_47","Bl_48","Bl_5","Bl_50","Bl_51","Bl_53","Bl_56","Bl_57","Bl_58","Bl_59","Bl_60","Bl_62","Bl_64","Bl_65","Bl_67","Bl_68","Bl_69","Bl_70","Bl_71","Bl_72","Bl_74","Bl_76","Bl_77","Bl_8","Bl_81","Bl_83","Bl_84","Bl_85","Bl_87","Bl_88","Bl_89","Bl_9","Bl_90","Bl_91","Bl_92","Bl_93","Bl_95","Bl_96","Bl_97","Bl_99","Bl_C001Q","Bl_C002M","Bl_C002Q","Bl_C003M","Bl_C003Q","Bl_exbM","Bl_exbQ","Bl_Q001M","Bl_Q001Q","Tis_indexNTC1","TIS_indexNTC2","Tis_indexNTC3","Tis_NTC5","Tis_NTC54","Tis_NTC55","Tis_NTC57","Tis_NTC58","Tis_NTC59","Tis_NTC6","Tis_NTC60","Tis_NTC7","Tis_NTC8","Tis_104","Tis_105","Tis_106","Tis_107","Tis_108","Tis_109","Tis_110","Tis_111","Tis_112","Tis_113","Tis_114","Tis_115","Tis_116","Tis_117","Tis_118","Tis_119","Tis_120","Tis_121","Tis_122","Tis_123","Tis_124","Tis_125","Tis_126","Tis_127","Tis_128","Tis_129","Tis_130","Tis_131","Tis_132","Tis_133","Tis_134","Tis_135","Tis_136","Tis_137","Tis_138","Tis_139","Tis_140","Tis_141","Tis_142","Tis_143","Tis_144","Tis_145","Tis_146","Tis_147","Tis_148","Tis_149","Tis_15","Tis_150","Tis_151","Tis_152","Tis_153","Tis_154","Tis_155","Tis_156","Tis_157","Tis_158","Tis_159","Tis_160","Tis_161","Tis_162","Tis_163","Tis_164","Tis_165","Tis_166","Tis_167","Tis_168","Tis_169","Tis_170","Tis_171","Tis_172","Tis_173","Tis_174","Tis_175","Tis_176","Tis_177","Tis_178","Tis_179","Tis_21","Tis_43","Tis_5","Tis_55","Tis_56","Tis_58","Tis_62","Tis_68","Tis_70","Tis_72","Tis_80","Tis_84","Tis_94","Tis_95","Tis_96","Tis_97","Tis_SC009","Tis_1","Tis_10","Tis_100","Tis_101","Tis_102","Tis_103","Tis_11","Tis_12","Tis_13","Tis_14","Tis_16","Tis_17","Tis_18","Tis_19","Tis_2","Tis_20","Tis_22","Tis_23","Tis_24","Tis_25","Tis_26","Tis_27","Tis_28","Tis_29","Tis_3","Tis_30","Tis_31","Tis_32","Tis_33","Tis_34","Tis_35","Tis_36","Tis_37","Tis_38","Tis_39","Tis_4","Tis_40","Tis_41","Tis_42","Tis_44","Tis_45","Tis_46","Tis_47","Tis_48","Tis_49","Tis_50","Tis_51","Tis_52","Tis_53","Tis_54","Tis_57","Tis_59","Tis_6","Tis_60","Tis_61","Tis_63","Tis_64","Tis_65","Tis_66","Tis_67","Tis_69","Tis_7","Tis_71","Tis_73","Tis_74","Tis_75","Tis_76","Tis_77","Tis_78","Tis_79","Tis_8","Tis_81","Tis_82","Tis_83","Tis_85","Tis_86","Tis_87","Tis_88","Tis_89","Tis_9","Tis_90","Tis_91","Tis_92","Tis_93","Tis_98","Tis_99","Tis_SC010","Tis_SC011","Tis_SCexb","Tk_1","Tk_100","Tk_101","Tk_102","Tk_103","Tk_107","Tk_108","Tk_119","Tk_120","Tk_122","Tk_131","Tk_134","Tk_136","Tk_141","Tk_143","Tk_144","Tk_148","Tk_149","Tk_150","Tk_151","Tk_152","Tk_153","Tk_154","Tk_155","Tk_156","Tk_157","Tk_158","Tk_159","Tk_160","Tk_161","Tk_162","Tk_163","Tk_164","Tk_165","Tk_166","Tk_167","Tk_168","Tk_169","Tk_17","Tk_170","Tk_171","Tk_172","Tk_173","Tk_174","Tk_175","Tk_176","Tk_177","Tk_178","Tk_179","Tk_180","Tk_181","Tk_182","Tk_183","Tk_184","Tk_185","Tk_186","Tk_187","Tk_188","Tk_189","Tk_19","Tk_190","Tk_191","Tk_192","Tk_193","Tk_194","Tk_195","Tk_196","Tk_197","Tk_198","Tk_199","Tk_20","Tk_200","Tk_201","Tk_202","Tk_203","Tk_204","Tk_205","Tk_206","Tk_207","Tk_208","Tk_209","Tk_210","Tk_211","Tk_212","Tk_213","Tk_214","Tk_215","Tk_216","Tk_217","Tk_218","Tk_219","Tk_220","Tk_221","Tk_222","Tk_223","Tk_224","Tk_225","Tk_27","Tk_3","Tk_34","Tk_38","Tk_4","Tk_40","Tk_46","Tk_47","Tk_48","Tk_49","Tk_50","Tk_51","Tk_52","Tk_53","Tk_54","Tk_56","Tk_57","Tk_59","Tk_60","Tk_61","Tk_62","Tk_64","Tk_68","Tk_71","Tk_72","Tk_73","Tk_74","Tk_76","Tk_85","Tk_86","Tk_89","Tk_90","Tk_91","Tk_95","Tk_96","Tk_97","Tk_98","Tk_83","Tk_2","Tk_18","Tk_35","Tk_84","Tk_37","Tk_5","Tk_22","Tk_39","Tk_63","Tk_75","Tk_87","Tk_99","Tk_6","Tk_24","Tk_88","Tk_8","Tk_26","Tk_41","Tk_65","Tk_77","Tk_9","Tk_42","Tk_66","Tk_78","Tk_11","Tk_29","Tk_43","Tk_55","Tk_79","Tk_12","Tk_31","Tk_44","Tk_67","Tk_80","Tk_92","Tk_104","Tk_13","Tk_32","Tk_45","Tk_69","Tk_81","Tk_93","Tk_105","Tk_15","Tk_33","Tk_58","Tk_70","Tk_82","Tk_94","Tk_ntc14","Tk_106","Tk_109","Tk_110","Tk_111","Tk_112","Tk_113","Tk_114","Tk_115","Tk_116","Tk_117","Tk_118","Tk_121","Tk_123","Tk_124","Tk_125","Tk_126","Tk_127","Tk_128","Tk_129","Tk_130","Tk_132","Tk_133","Tk_135","Tk_137","Tk_138","Tk_139","Tk_140","Tk_142","Tk_145","Tk_146","Tk_147","Tk_Ecol1","Tk_Ecol10","Tk_Ecol11","Tk_Ecol12","Tk_Ecol13","Tk_Ecol14","Tk_Ecol15","Tk_Ecol16","Tk_Ecol17","Tk_Ecol18","Tk_Ecol19","Tk_Ecol2","Tk_Ecol20","Tk_Ecol21","Tk_Ecol22","Tk_Ecol23","Tk_Ecol24","Tk_Ecol25","Tk_Ecol3","Tk_Ecol4","Tk_Ecol5","Tk_Ecol6","Tk_Ecol7","Tk_Ecol8","Tk_Ecol9","Tk_ntc15","Tk_ntc16","Tk_ntcecol1","Tk_ntcecol2"))))
###Not using tsv import anymore
#metadata <- read_tsv("../data/dada2/phyloseq/metadata.tsv")
# Remove second row which contains column data for QIIME2 format
metadata <- metadata[-c(1), ]
metadata_lab = column_to_rownames(metadata, var = "new_sample-id")
sampledata = sample_data(data.frame(metadata_lab))
sampledata
Renaming sample IDs
Renaming sample-ids for consistency and matching to ENA accession numbers.
OldName <- colnames(dada2_ASVs_lab)
NewName <- rownames(metadata_lab)
dada2_ASVs_lab_renamed <- dada2_ASVs_lab %>% rename_at(vars(OldName), ~ NewName)
# Make OTU matrix
otumat <- as.matrix(dada2_ASVs_lab_renamed)
# Make Tax matrix
taxmat <- as.matrix(taxonomy_lab)
Create phyloseq object
Check the class of the otumat and taxmat objects, they MUST be in matrix format. Then we can great a phyloseq object called physeq from the otu and taxonomy tables and check the sample names.
class(otumat)
class(taxmat)
OTU = otu_table(otumat, taxa_are_rows = TRUE)
TAX = tax_table(taxmat)
physeq = phyloseq(OTU, TAX)
physeq
sample_names(physeq)
Now you can merge your sequence and metadata to create a final phyloseq object
ps_raw_bact = merge_phyloseq(physeq, sampledata, rep.seqs)
Remove samples with NA values or not part of final data set,
ps_raw_bact <- subset_samples(ps_raw_bact, !SampleType=="EcolSample")
An easy way to view the tables is using Nice Tables
Nice.Table(ps_raw_bact@sam_data)
Nice.Table(ps_bact_samp@tax_table)
R package decontam
to assess contaminating OTUs, tutorial.
The CRAN version only works on R version <4. To install for R versions >4 install from bioconductor using the following
Make plot of library size of Samples vs Controls
df <- as.data.frame(sample_data(ps_raw_bact)) # Put sample_data into a ggplot-friendly data.frame
df$LibrarySize <- sample_sums(ps_raw_bact)
df <- df[order(df$LibrarySize),]
df$Index <- seq(nrow(df))
libQC <- ggplot(data=df, aes(x=Index, y=LibrarySize, color=SampleType)) + geom_point()
ggsave("libQC.pdf", plot = libQC, path = "output/plots", width = 10, height = 10, units = "cm")
Make html plot with plotly
libQCplotly <- ggplotly(libQC)
htmlwidgets::saveWidget(libQCplotly, "output/plots/libQCplotly.html")
Identify contaminating ASVs - define control samples and threshold (e.g. 0.05
)
df <- as.data.frame(sample_data(ps_raw_bact)) # Put sample_data into a ggplot-friendly data.frame
sample_data(ps_raw_bact)$is.neg <- sample_data(ps_raw_bact)$SampleType == "Control"
contamdf.prev <- isContaminant(ps_raw_bact, method="prevalence", neg="is.neg", threshold = 0.05)
Identify contaminants
table(contamdf.prev$contaminant)
head(which(contamdf.prev$contaminant))
table(contamdf.prev$contaminant)
con_ASVs <- contamdf.prev %>%
filter(contaminant == "TRUE")
con_ASVs <- rownames(con_ASVs)
Take a look at the number of times several of these taxa were observed in negative controls and positive samples.
# Make phyloseq object of presence-absence in negative controls and true samples
ps.pa <- transform_sample_counts(ps_raw_bact, function(abund) 1*(abund>0))
ps.pa.neg <- prune_samples(sample_data(ps_raw_bact)$SampleType == "Control", ps.pa)
ps.pa.pos <- prune_samples(sample_data(ps.pa)$SampleType == "Sample", ps.pa)
# Make data.frame of prevalence in positive and negative samples
df.pa <- data.frame(pa.pos=taxa_sums(ps.pa.pos), pa.neg=taxa_sums(ps.pa.neg),
contaminant=contamdf.prev$contaminant)
# Make plot and save as pdf
deconplot <- ggplot(data=df.pa, aes(x=pa.neg, y=pa.pos, color=contaminant)) + geom_point() +
xlab("Prevalence (Negative Controls)") + ylab("Prevalence (True Samples)")
ggsave("deconplot.pdf", plot = deconplot, path = "output/plots", width = 10, height = 10, units = "cm")
Make distribution plot of reads using microbiomeutilities
distrib <- plot_read_distribution(ps_raw_bact, groups = "SampleCategory",
plot.type = "density") + xlab("Reads per sample") + ylab("Density")
distrib <- distrib + geom_density(alpha = 0.5, fill = "grey")
ggsave("distrib.pdf", plot = distrib, path = "output/plots", width = 15, height = 10, units = "cm")
As determined by decontam
methods identify contaminant ASVs and threshold for positive reads. Then transform otu count data.
ps_dec_bact <- prune_taxa(!contamdf.prev$contaminant, ps_raw_bact)
ps_dec_bact@otu_table [, 1:702][ps_dec_bact@otu_table [, 1:702] < 100] <- 0
Save R data for phyloseq object - saving “raw data” (ps_raw_bact
) and “decontaminated data” (ps_dec_bact
)
save(ps_raw_bact, file = "RData/ps_raw_bact.RData")
save(ps_dec_bact, file = "RData/ps_dec_bact.RData")
To load raw and decon data quickly from .RData
format.
load("RData/ps_raw_bact.RData")
load("RData/ps_dec_bact.RData")
Subset phyloseq object based on sample types
# Exclude controls
ps_bact_samp = subset_samples(ps_dec_bact, !SampleType=="Control")
# Blood samples only
ps_bact_bl = subset_samples(ps_bact_samp, SampleCategory=="Blood")
# Tissue samples only
ps_bact_tis = subset_samples(ps_bact_samp, SampleCategory=="Tissue")
# Tick samples only
ps_bact_tick = subset_samples(ps_bact_samp, SampleCategory=="Tick")
Subset for taxa of interest - family level
# taxa of interest
ps_toi_fam = subset_taxa(ps_bact_samp, Family=="Coxiellaceae" | Family=="Mycoplasmataceae" | Family=="Bartonellaceae" | Family=="Francisellaceae" | Family=="Borreliaceae" | Family=="Anaplasmataceae" | Family=="Midichloriaceae" | Family =="Mycobacteriaceae" | Family=="Rickettsiaceae")
ps_toi_fam = prune_taxa(taxa_sums(ps_toi_fam) > 0, ps_toi_fam)
writeXStringSet(ps_toi_fam@refseq, "data/dada2_tois/ps_toi_fam.fasta")
melt_toi_fam = psmelt(ps_toi_fam)
melt_toi_fam = subset(melt_toi_fam, Abundance > 0)
write.csv(melt_toi_fam, "data/dada2_tois/melt_toi_fam.csv")
Subset for taxa of interest - genus
# taxa of interest
ps_toi_gen = subset_taxa(ps_bact_samp, Family=="Coxiellaceae" | Genus=="Mycoplasma" | Genus=="Bartonella" | Genus=="Francisella" | Genus=="Borrelia" | Genus=="Rickettsia" | Genus=="Neoehrlichia" | Genus=="Ehrlichia" | Genus=="Anaplasma" | Genus=="Candidatus Midichloria")
ps_toi_gen = prune_taxa(taxa_sums(ps_toi_gen) > 0, ps_toi_gen)
writeXStringSet(ps_toi_gen@refseq, "data/dada2_tois/ps_toi_gen.fasta")
melt_toi_gen = psmelt(ps_toi_gen)
melt_toi_gen = subset(melt_toi_gen, Abundance > 0)
write.csv(melt_toi_gen, "data/dada2_tois/melt_toi_gen.csv")
Outside of RStudio
Align sequences and produce quick neighbour joining tree using muscle64. Code for command line terminal with muscle64
installed in the $PATH
# Family taxa of interest
muscle64 -in data/dada2_tois/ps_toi_fam.fasta -out data/dada2_tois/ps_toi_fam_musaln.fasta
muscle64 -maketree -in data/dada2_tois/ps_toi_fam_musaln.fasta -out data/dada2_tois/ps_toi_fam_tree.phy -cluster neighborjoining
# Genus taxa of interest
muscle64 -in data/dada2_tois/ps_toi_gen.fasta -out data/dada2_tois/ps_toi_gen_musaln.fasta
muscle64 -maketree -in data/dada2_tois/ps_toi_gen_musaln.fasta -out data/dada2_tois/ps_toi_gen_tree.phy -cluster neighborjoining
Now back to RStudio
Add tree for this taxa
tree_toi_fam
tree_toi_fam <- read_tree("../data/dada2_tois/ps_toi_fam_tree.phy")
# Merge tree object into phyloseq to create tree_toi_fam object
ps_toi_fam <- merge_phyloseq(ps_toi_fam, tree_toi_fam)
tree_toi_gen
tree_toi_gen <- read_tree("../data/dada2_tois/ps_toi_gen_tree.phy")
# Merge tree object into phyloseq to create tree_toi_gen object
ps_toi_gen <- merge_phyloseq(ps_toi_gen, tree_toi_gen)
More subsetting
Subset phyloseq object based on host species
# Black rat
ps_BR = subset_samples(ps_bact_samp, species=="Black rat")
# Brush tail possum
ps_BTP = subset_samples(ps_bact_samp, species=="Brush tail possum")
# Chuditch
ps_chud = subset_samples(ps_bact_samp, species=="Chuditch")
# Long-nosed bandicoot
ps_LNB = subset_samples(ps_bact_samp, species=="Long-nosed bandicoot")
Make ampvis2 object for analysis
#require the devtools package to source gist
if(!require("devtools"))
install.packages("devtools")
#source the phyloseq_to_ampvis2() function from the gist
devtools::source_gist("8d0ca4206a66be7ff6d76fc4ab8e66c6")
# convert
amp_raw_bact <- phyloseq_to_ampvis2(ps_raw_bact)
amp_dec_bact <- phyloseq_to_ampvis2(ps_dec_bact)
amp_toi_fam <- phyloseq_to_ampvis2(ps_toi_fam)
amp_toi_gen <- phyloseq_to_ampvis2(ps_toi_gen)
Save R data for ampvis2 object - saving “decontam ampvis2 data” (bact_amp
)
# raw ampvis2
save(amp_raw_bact, file = "RData/amp_raw_bact.RData")
# decontam ampvis2
save(amp_dec_bact, file = "RData/amp_dec_bact.RData")
Subset ampvis2 object based on sample category
#remove controls
amp_samp <- amp_subset_samples(amp_dec_bact,
!SampleType %in% c("Control"),
RemoveAbsents = TRUE)
#blood samples
amp_bl <- amp_subset_samples(amp_samp,
SampleCategory %in% c("Blood"),
RemoveAbsents = TRUE)
#tissue samples
am_tis <- amp_subset_samples(amp_samp,
SampleCategory %in% c("Tissue"),
RemoveAbsents = TRUE)
#tick samples
amp_tick <- amp_subset_samples(amp_samp,
SampleCategory %in% c("Tick"),
RemoveAbsents = TRUE)
#tick samples
amp_tick_sub <- amp_subset_samples(amp_tick,
instar %in% c("Female", "Larvae", "Nymph"),
RemoveAbsents = TRUE)
Ampvis2 have an easy way to merge replicate samples (i.e. duplicate PCR amplicons)
#merge by gDNAID
dmerged <- amp_mergereplicates(ps_bact_samp,
merge_var = "gDNAID",
round = "up"
)
dmerged$metadata
Filter out low abundant samples
amp_samp1000 <- amp_subset_samples(amp_samp,
minreads = 1000,
RemoveAbsents = TRUE)
amp_samp1000
Subset certain species and sites
chud <- amp_subset_samples(amp_samp,
species %in% c("Chuditch"),
!SampleCategory %in% c("Tick"),
RemoveAbsents = TRUE)
LNB <- amp_subset_samples(amp_samp,
species %in% c("Long-nosed bandicoot"),
RemoveAbsents = TRUE)
BTP <- amp_subset_samples(amp_samp,
species %in% c("Brush tail possum"),
RemoveAbsents = TRUE)
BR <- amp_subset_samples(amp_samp,
species %in% c("Black rat"),
RemoveAbsents = TRUE)
Set colours when comparing blood, tick and tissue samples.
ColSampCat = c("#7a255d", "#9fd0cb", "#7566ff")
Subset phyloseq object and convert to ‘long’ dataframe to quickly identify samples.
Identify unidentified taxa at kingdom level
# subset taxa to include those with no Kingdom assignment
kingdom_na = subset_taxa(ps_bact_samp, is.na(Kingdom))
# remove taxa with 0 abundance
kingdom_na <- prune_taxa(taxa_sums(kingdom_na) > 0, kingdom_na)
# write sequences to fasta file
writeXStringSet(kingdom_na@refseq, "data/dada2/unclassified_taxa/kingdom_na.fa")
# melt dataframe into long df
m_kingdom = psmelt(kingdom_na)
#Remove all rows were abundance is zero
m_kingdom2 = m_kingdom[m_kingdom$Abundance != 0, ]
# write melt data frame to csv file
write.csv(m_kingdom2, "data/dada2/unclassified_taxa/kingdom_na.csv")
Taxa of interest - example
# subset taxa to include those with no Kingdom assignment
bart = subset_taxa(ps_bact_samp, Genus=="Rickettsia")
# remove taxa with 0 abundance
bart = prune_taxa(taxa_sums(bart) > 0, bart)
# melt dataframe into long df
m_bart= psmelt(bart)
#Remove all rows were abundance is zero
m_bart2 = m_bart[m_bart$Abundance != 0, ]
# identify number of sample positive (i.e. `gDNAID``)
n_distinct(m_bart2$gDNAID)
# identify number of animals positive (i.e. `animalID``)
n_distinct(m_bart2$animalID)
unique(m_bart2$species)
tmp = m_bart2 %>%
filter(SampleCategory == "Blood")
# identify number of sample positive (i.e. `gDNAID``)
n_distinct(tmp$gDNAID)
# identify number of animals positive (i.e. `animalID``)
n_distinct(tmp$animalID)
# List host species
unique(tmp$species)
unique(tmp$tick_species)
sessionInfo()