Content for BIO513 genomics data analysis workshop 2022.

Outline

This lecture will provide an introduction in the data types and analysis for genomics. The advent of high-throughput sequencing platforms has resulted in large volumes of data being produced. It important to know how to manage and interpret this data in a reproducible way.

This lecture will cover the following aspects:

Sequence data and databases

Databases and online resources for genomics
Common sequence data file formats (e.g. .fastq, .fasta, .bam, fast5)
What data is contained in files and how to interpret information

Tools for analyzing data

Tools to query, inspect, visualize sequence files
Demultiplexing, merging, trimming and quality filtering
Discuss methods of assembly/clustering in the different contexts of sequence data
- Amplicon sequencing (clustering/denoising into taxonomic units)
- Shotgun sequencing (assembling contigs and scaffolds)
Be familar with both Unix shell and R environment (inc. packages) for sequence data

Set up

Important note: lessons outlined here are designed in the context of delivery through the BIO513 unit RStudio/google cloud server.

Please navigate to the RStudio server and login with your details provided earlier in the unit.

Create a new Rmarkdown file by doing to File > New File > R Markdown …

See the setup page for details on obtaining data.

BIO513 - genomics lecture and workshop

Outline

Set up