Background
This workflow is written for analysing amplicon data in QIIME2. Input data is demultiplexed paired-end reads in the file format .fastq
where files are zipped (i.e. .fastq.gz
).
We will not be performing the steps outlined below today but those that would like further information can stay behind for assistance in installing the required programs to reproduce the analysis later.
The code below is written for execution via the command line (bash language).
This workflow utilsing commandline interface with QIIME2.
Requires miniconda/conda, see here
QIIME2 version = qiime2-2021.4
Activate qiime2 environment
conda activate qiime2-2021.4
To view .qza
and .qzv
files go to QIIME2 view
Assumes paired-end data that does not require demultiplexing
Place raw data files in zipped format (i.e. .fastq.gz
in a directory called raw_data/
).he raw data can be downloaded from repository link here.
In Casava 1.8 demultiplexed (paired-end) format, there are two .fastq.gz
files for each sample in the study, each containing the forward or reverse reads for that sample. The file name includes the sample identifier. The forward and reverse read file names for a single sample might look like XXXX_XXXX_L001_R1_001.fastq.gz and XXXX_XXXX_L001_R2_001.fastq.gz, respectively. The underscore-separated fields in this file name are:
Note: QIIME2 input however is zipped fastq files i.e. .fastq.gz
.
In this analysis the raw data files names are:
SampleID-student_Library_L001_R1_001.fastq.gz
S1-TC_S06_L001_R1_001.fastq.gz
Import .fastq.gz
data into QIIME2 format using Casava 1.8 demultiplexed (paired-end) option. Remember assumes raw data is in directory labelled raw_data/
and file naming format as above.
qiime tools import \
--type 'SampleData[PairedEndSequencesWithQuality]' \
--input-path raw_data \
--input-format CasavaOneEightSingleLanePerSampleDirFmt \
--output-path 16S_demux_seqs.qza
To inspect raw reads we generate a qzv
file to view in the QIIME2 view website.
Use this output to choose your parameters for QC such as trimming low quality sequences and truncating sequence length.
qiime demux summarize \
--i-data 16S_demux_seqs.qza \
--o-visualization 16S_demux_seqs.qzv
YOUR TURN: Navigate to the website https://view.qiime2.org/ and upload the file sequence quality file that is in the directory data/qiime2/16S_demux_seqs.qzv
. Drag and drop it into the QIIME2 view website.
Denoise using dada2
Based on quality plot in the above output 16S_demux_seqs.qza
adjust trim length to where quality falls.
Then you can also trim primers. In this case working with 16S v4 data with the following primers
515F - GTGYCAGCMGCCGCGGTAA (19 nt) 806R - GGACTACNVGGGTWTCTAAT (20 nt)
qiime dada2 denoise-paired \
--i-demultiplexed-seqs 16S_demux_seqs.qza \
--p-trim-left-f 19 \
--p-trim-left-r 20 \
--p-trunc-len-f 250 \
--p-trunc-len-r 250 \
--o-table 16S_denoise_table.qza \
--o-representative-sequences 16S_denoise_rep-seqs.qza \
--o-denoising-stats 16S_denoise-stats.qza
At this stage, you will have artifacts containing the feature table, corresponding feature sequences, and DADA2 denoising stats. Now we will want to generate summaries of these
qiime feature-table summarize \
--i-table 16S_denoise_table.qza \
--o-visualization 16S_denoise_table.qzv \
qiime feature-table tabulate-seqs \
--i-data 16S_denoise_rep-seqs.qza \
--o-visualization 16S_denoise_rep-seqs.qzv
qiime metadata tabulate \
--m-input-file 16S_denoise-stats.qza \
--o-visualization 16S_denoise-stats.qzv
YOUR TURN: Navigate to the website https://view.qiime2.org/ and upload the file sequence quality file that is in the directory data/qiime2/16S_denoise-stats.qzv
. Drag and drop it into the QIIME2 view website.
To produce an ASV table with number of each ASV reads per sample that you can open in excel. Use tutorial here
Need to make biom file first
qiime tools export \
--input-path 16S_denoise_table.qza \
--output-path feature-table
biom convert \
-i feature-table/feature-table.biom \
-o feature-table/feature-table.tsv \
--to-tsv
Several downstream diversity metrics, available within QIIME 2, require that a phylogenetic tree be constructed using the Operational Taxonomic Units (OTUs) or Amplicon Sequence Variants (ASVs) being investigated. Documentation here
qiime phylogeny align-to-tree-mafft-fasttree \
--i-sequences rep-seqs.qza \
--o-alignment aligned-rep-seqs.qza \
--o-masked-alignment masked-aligned-rep-seqs.qza \
--o-tree unrooted-tree.qza \
--o-rooted-tree rooted-tree.qza
Export
Covert trees output to newick formatted file
qiime tools export \
--input-path unrooted-tree.qza \
--output-path exported-unrooted-tree
qiime tools export \
--input-path rooted-tree.qza \
--output-path exported-rooted-tree
Assign taxonomy to denoised sequences using a pre-tarined naive bayes classifier and the q2-feature-classifier plugin. Details on how to create a classifier are available here.
Note that taxonomic classifiers perform best when they are trained based on your specific sample preparation and sequencing parameters, including the primers that were used for amplification and the length of your sequence reads.
qiime feature-classifier classify-sklearn \
--i-classifier ~/db/QIIME2_trained_classifiers_2021.4/silva-138-99-nb-classifier.qza \
--i-reads 16S_denoise_rep-seqs.qza \
--o-classification qiime2-taxa-silva/taxonomy.qza
qiime metadata tabulate \
--m-input-file qiime2-taxa-silva/taxonomy.qza \
--o-visualization qiime2-taxa-silva/taxonomy.qzv
Generate a taxonomy table table to tsv from the commandline
qiime tools export \
--input-path qiime2-taxa-silva/taxonomy.qza \
--output-path exports
Copyright, Siobhon Egan, 2021.