Content for BIO513 genomics data analysis workshop 2022.

Outline

This lecture will provide an introduction in the data types and analysis for genomics. The advent of high-throughput sequencing platforms has resulted in large volumes of data being produced. It important to know how to manage and interpret this data in a reproducible way.

This lecture will cover the following aspects:

  1. Sequence data and databases
  • Databases and online resources for genomics
  • Common sequence data file formats (e.g. .fastq, .fasta, .bam, fast5)
  • What data is contained in files and how to interpret information
  1. Tools for analyzing data
  • Tools to query, inspect, visualize sequence files
  • Demultiplexing, merging, trimming and quality filtering
  • Discuss methods of assembly/clustering in the different contexts of sequence data
    • Amplicon sequencing (clustering/denoising into taxonomic units)
    • Shotgun sequencing (assembling contigs and scaffolds)
  • Be familar with both Unix shell and R environment (inc. packages) for sequence data

Set up

Important note: lessons outlined here are designed in the context of delivery through the BIO513 unit RStudio/google cloud server.

Please navigate to the RStudio server and login with your details provided earlier in the unit.

Create a new Rmarkdown file by doing to File > New File > R Markdown …

See the setup page for details on obtaining data.




Copyright, Siobhon Egan, 2022.