REQUIRED - We will be using RStudio to analyse the data set. It is recommend you have the following installed: RStudio version 1.4 or later and R version 4.0 or later. Further details on getting started in RStudio here.
If you are following along within the BIO513 google RStudio cloud the data is available for you in the directory data/.
If you are following along on your own computer the easiest way is to download this GitHub repository using either option A or B below:
Go to https://github.com/siobhon-egan/2022-systMed-genomics and click on the green Code button. Select Download ZIP, open/unzip the file. Open the .Rmd
files in RStudio you will be able to follow along for the data analysis.
Use terminal and clone the GitHub repo.
git clone https://github.com/siobhon-egan/2022-systMed-genomics.git
Many of the data files used are contained in the data/ directory for you.
As raw .fastq
sequence files are large I have not included this within the GitHub repo. You can download them following instructions below:
You can retrieve the data from https://sra-explorer.info and search for project accession PRJNA493625
. Make sure you follow the same data directory location (i.e. data/ngs/illumina to ensure code below runs smoothly).
#!/usr/bin/env bash
curl -L ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR794/003/SRR7943543/SRR7943543_1.fastq.gz -o SRR7943543_16S_V4_of_human_feces_PBS39_1.fastq.gz
curl -L ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR794/003/SRR7943543/SRR7943543_2.fastq.gz -o SRR7943543_16S_V4_of_human_feces_PBS39_2.fastq.gz
curl -L ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR794/004/SRR7943544/SRR7943544_1.fastq.gz -o SRR7943544_16S_V4_of_human_feces_PBS63_1.fastq.gz
curl -L ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR794/004/SRR7943544/SRR7943544_2.fastq.gz -o SRR7943544_16S_V4_of_human_feces_PBS63_2.fastq.gz
curl -L ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR794/008/SRR7943538/SRR7943538_1.fastq.gz -o SRR7943538_16S_V4_of_human_feces_PBS49_1.fastq.gz
curl -L ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR794/008/SRR7943538/SRR7943538_2.fastq.gz -o SRR7943538_16S_V4_of_human_feces_PBS49_2.fastq.gz
curl -L ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR794/009/SRR7943539/SRR7943539_1.fastq.gz -o SRR7943539_16S_V4_of_human_feces_PBS43_1.fastq.gz
curl -L ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR794/009/SRR7943539/SRR7943539_2.fastq.gz -o SRR7943539_16S_V4_of_human_feces_PBS43_2.fastq.gz
You can retrieve the data from https://sra-explorer.info and search for project accession PRJNA521754
. Make sure you follow the same data directory location (i.e. data/ngs/pacbio to ensure code below runs smoothly).
#!/usr/bin/env bash
curl -L ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR855/009/SRR8557479/SRR8557479_subreads.fastq.gz -o SRR8557479_Full-length_16S_amplicon_sequencing_human_feces_subreads.fastq.gz
curl -L ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR855/000/SRR8557480/SRR8557480_subreads.fastq.gz -o SRR8557480_Full-length_16S_amplicon_sequencing_human_feces_subreads.fastq.gz
curl -L ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR855/008/SRR8557478/SRR8557478_subreads.fastq.gz -o SRR8557478_Full-length_16S_amplicon_sequencing_human_feces_subreads.fastq.gz
curl -L ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR855/007/SRR8557477/SRR8557477_subreads.fastq.gz -o SRR8557477_Full-length_16S_amplicon_sequencing_human_feces_subreads.fastq.gz
Copyright, Siobhon Egan, 2022.