QIIME2

Last updated: 2021-07-03

Checks: 2 0

Knit directory: wildlife-bacteria/

This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: up-to-date

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Repository version: 1734474

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 1734474. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .DS_Store
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    analysis/.DS_Store
    Ignored:    analysis/_footer.html
    Ignored:    data/.DS_Store
    Ignored:    output/.DS_Store
    Ignored:    output/plots/.DS_Store
    Ignored:    output/plots/QC/.DS_Store
    Ignored:    output/plots/boxplots_select_taxa/.DS_Store
    Ignored:    output/plots/heatmaps/.DS_Store
    Ignored:    output/plots/maps/.DS_Store
    Ignored:    output/plots/tax_prev_abund/.DS_Store

Untracked files:
    Untracked:  ENA_docs/
    Untracked:  NCBI_data/
    Untracked:  analysis/map.Rmd
    Untracked:  data/dada2/
    Untracked:  data/dada2_tois/
    Untracked:  data/taxa_trees/
    Untracked:  data/tmp/
    Untracked:  output/beta-div-statistics.txt
    Untracked:  output/supp_table_pos.xlsx
    Untracked:  tmp/

Unstaged changes:
    Modified:   RData/amp_dec_bact.RData
    Modified:   RData/amp_raw_bact.RData
    Modified:   RData/ps_dec_bact.RData
    Modified:   RData/ps_raw_bact.RData
    Modified:   README.md
    Modified:   analysis/_site.yml
    Modified:   analysis/index.Rmd
    Modified:   analysis/microbiome-viz.Rmd
    Modified:   analysis/phyloseq.Rmd
    Deleted:    analysis/site-map.Rmd
    Deleted:    analysis/tois.Rmd

Staged changes:
    Deleted:    analysis/phylogeny.Rmd

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the repository in which changes were made to the R Markdown (analysis/QIIME2.Rmd) and HTML (docs/QIIME2.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File	Version	Author	Date	Message
Rmd	33466dd	siobhon-egan	2021-07-01	update inc accession no
html	33466dd	siobhon-egan	2021-07-01	update inc accession no
Rmd	939f8bd	siobhon-egan	2021-05-23	update QIIME2 page
html	939f8bd	siobhon-egan	2021-05-23	update QIIME2 page
html	5f4c86d	siobhon-egan	2021-05-23	restructure pages
Rmd	0d602a3	siobhon-egan	2021-05-23	updates to pages
html	0d602a3	siobhon-egan	2021-05-23	updates to pages
html	a69dea3	siobhon-egan	2021-04-24	Build site.
Rmd	6ebcc5d	siobhon-egan	2021-04-24	updates
html	6ebcc5d	siobhon-egan	2021-04-24	updates
html	5486f7e	siobhon-egan	2021-02-26	Build site.
Rmd	ebe20d0	siobhon-egan	2021-02-26	add qiime and phyloseq pages

Analysis of 16S rRNA (hypervariable region v1-2) metabarcoding.

Raw Illumina MiSeq .fastq.gz reads analysed using QIIME2-2020.11 pipeline using dada2 denoising to create ASVs.

Background

This workflow is written for analysing amplicon data in QIIME2. Input data is Illumina MiSeq paired-end data prepared using Nextera XT indexes (i.e. no additional demultiplexing steps are needed in this case however should your data require demultiplexing it can easily be added in).

0. Install & activate QIIME2 environment (commandline)

This workflow utilsing commandline interface with QIIME2.

Requires miniconda/conda, see here

Latest version = QIIME2-2020.11, see QIIME2 documentation for install based on your platform.

Activate qiime2 environment

conda activate qiime2-2020.11

1. Input data

Assumes paired-end data that does not require demultiplexing

Place raw data files in zipped format (i.e. .fastq.gz in a directory called raw_data/).

File naming conventions

In Casava 1.8 demultiplexed (paired-end) format, there are two .fastq.gz files for each sample in the study, each containing the forward or reverse reads for that sample. The file name includes the sample identifier. The forward and reverse read file names for a single sample might look like XXXX_L001_R1_001.fastq.gz and XXXX_L001_R2_001.fastq.gz, respectively. The underscore-separated fields in this file name are:

the sample identifier,
the barcode sequence or a barcode identifier,
the lane number,
the direction of the read (i.e. R1 or R2), and
the set number.

Depending on sequencing facility you may need to add the _001 prefix to sample files.

Note however that you do not need to unzip fastq data to analyse.

Navigate into the directory with raw data files:

for file in raw_data/*.fastq.gz;
do
newname=$(echo "$file" | sed 's/0_BPDNR//' | sed 's/.fastq/_001.fastq/')
mv $file $newname
done

Import as QIIME2 artefact

Import .fastq.gz data into QIIME2 format using Casava 1.8 demultiplexed (paired-end) option. Remember assumes raw data is in directory labelled raw_data/ and file naming format as above.

qiime tools import \
--type 'SampleData[PairedEndSequencesWithQuality]' \
--input-path raw_data \
--input-format CasavaOneEightSingleLanePerSampleDirFmt \
--output-path 16S_demux_seqs.qza

In this case we are using Nextera Indexes which mean they are demultiplexed automatically by basespace and therefore we can skip over any reference to demultiplexing steps.

Inspect reads for quality To inspect raw reads

qiime demux summarize \
  --i-data 16S_demux_seqs.qza \
  --o-visualization 16S_demux_seqs.qzv

View this output by importing into QIIME2 view. Use this output to choose your parameters for QC such as trimming low quality sequences and truncating sequence length.

Sample metadata

This holds you associated metadata related to your samples (e.g. host information, sampling data, etc). Tutorial here

The metadata needs to be in .tsv format, the best way to do this is to access the QIIME2 googlesheet example. Save a copy and edit/add in your sample details. Then select File > Download as > Tab-separated values. Alternatively, the command wget "https://data.qiime2.org/2020.11/tutorials/moving-pictures/sample_metadata.tsv" will download the sample metadata as tab-separated text and save it in the file sample-metadata.tsv. It is import you don’t change the header for the first column sample-id.

2. Sequence quality control and feature table construction

Denoise using dada2

Based on quality plot in the above output 16S_demux_seqs.qza adjust trim length to where quality falls.

Then you can also trim primers. In this case working with 16S v1-2 data with the following primers

Example data - amplicon NGS data targeting bacteria using 16S rRNA hypervariable region 1-2 with the following primers:

27F-Y (20 nt): AGAGTTTGATCCTGGCTYAG #16S v1-2 primer, ref Gofton et al. Parasites & Vectors (2015) 8:345
338R (19 nt): TGCTGCCTCCCGTAGGAGT #16S v1-2 primer, ref Turner et al. J Eukaryot Microbiol (1999) 46(4):32

qiime dada2 denoise-paired \
  --i-demultiplexed-seqs 16S_demux_seqs.qza \
  --p-trim-left-f 20 \
  --p-trim-left-r 19 \
  --p-trunc-len-f 250 \
  --p-trunc-len-r 250 \
  --o-table 16S_denoise_table.qza \
  --o-representative-sequences 16S_denoise_rep-seqs.qza \
  --o-denoising-stats 16S_denoise-stats.qza

At this stage, you will have artifacts containing the feature table, corresponding feature sequences, and DADA2 denoising stats. You can generate summaries of these as follows.

qiime feature-table summarize \
  --i-table 16S_denoise_table.qza \
  --o-visualization 16S_denoise_table.qzv \
  --m-sample-metadata-file sample-metadata.tsv # Can skip this bit if needed.

qiime feature-table tabulate-seqs \
  --i-data 16S_denoise_rep-seqs.qza \
  --o-visualization 16S_denoise_rep-seqs.qzv

qiime metadata tabulate \
  --m-input-file 16S_denoise-stats.qza \
  --o-visualization 16S_denoise-stats.qzv

Merging denoised artefacts

To merge denoised data sets and generate one FeatureTable[Frequency] and FeatureData[Sequence] artifacts

qiime feature-table merge \
  --i-tables table-1.qza \
  --i-tables table-2.qza \
  --o-merged-table table.qza
qiime feature-table merge-seqs \
  --i-data rep-seqs-1.qza \
  --i-data rep-seqs-2.qza \
  --o-merged-data rep-seqs.qza

Export ASV table

To produce an ASV table with number of each ASV reads per sample that you can open in excel. Use tutorial here

Need to make biom file first

qiime tools export \
--input-path 16S_denoise_table.qza \
--output-path feature-table

biom convert \
-i feature-table/feature-table.biom \
-o feature-table/feature-table.tsv \
--to-tsv

Phylogenetic tree

Several downstream diversity metrics, available within QIIME 2, require that a phylogenetic tree be constructed using the Operational Taxonomic Units (OTUs) or Amplicon Sequence Variants (ASVs) being investigated. Documentation here

qiime phylogeny align-to-tree-mafft-fasttree \
  --i-sequences rep-seqs.qza \
  --o-alignment aligned-rep-seqs.qza \
  --o-masked-alignment masked-aligned-rep-seqs.qza \
  --o-tree unrooted-tree.qza \
  --o-rooted-tree rooted-tree.qza

Export

Covert unrooted tree output to newick formatted file

qiime tools export \
  --input-path unrooted-tree.qza \
  --output-path exported-tree

3.Taxonomy

Assign taxonomy to denoised sequences using a pre-tarined naive bayes classifier and the q2-feature-classifier plugin. Details on how to create a classifier are available here.

Note that taxonomic classifiers perform best when they are trained based on your specific sample preparation and sequencing parameters, including the primers that were used for amplification and the length of your sequence reads.

qiime feature-classifier classify-sklearn \
--i-classifier /Taxonomy/QIIME2_classifiers_v2020.11/Silva_99_Otus/27F-388Y/classifier.qza \
--i-reads 16S_denoise_rep-seqs.qza \
--o-classification qiime2-taxa-silva/taxonomy.qza

qiime metadata tabulate \
--m-input-file qiime2-taxa-silva/taxonomy.qza \
--o-visualization qiime2-taxa-silva/taxonomy.qzv

In order to be able to download the sample OTU table need to do the taxonomy assignment and then make the taxa barplot. Then can download csv file with sequence number, samples and taxonomy. see here

qiime taxa barplot \
  --i-table table.qza \
  --i-taxonomy taxonomy.qza \
  --m-metadata-file sample-metadata.tsv \
  --o-visualization taxa-bar-plots.qzv

Details on sample metadata available here

Extra bit of code to generate a taxonomy table table to tsv from the commandline

qiime tools export \
--input-path taxonomy.qza \
--output-path exports

Extra info

Place to leave some links

Project website by Siobhon L. Egan, 2021. This site was created in R Markdown with workflowr