University of Bologna
Genomics Course
Bioinformatics Lab
Teacher: Prof. Federico M. Giorgi
Teaching Assistant: Dr. Chiara Cabrelle
Duration: 60 hours (15 modules of ~4 hours + optional
extras)
Exam: Oral
The course aims at giving a practical overview of all the useful tools,
approaches and techniques necessary for a competitive bioinformatician in 2019.
Module 1: Introduction to and testing of the working environment
- Virtual Box
- Linux Refreshment
- Playing with a FASTA file: wc, grep, htop, regex, sed
- EMBOSS suite
- Remove/install programs using apt (htop)
- Projects and Exercise structure
Module 2: Phylogenetic Sequence Analysis
- Sequence databases: how to download sequences from NCBI
- Building a phylogenetic multifasta (MYC family)
- Multiple Sequence Alignment (Muscle, ClustalW, TCoffee)
- Building a Phylogenetyc Tree (PHYLIP)
- Phylogenetic GUI: MEGA
Module 3: Remote Homology Detection
- BLAST
introduction
- Create,
format and index a sequence database (BLAST formatdb)
- BLASTN/BLASTP/TBLASTX
with various options
- Discover
the organism of a mysterious sequence
- PSI-BLAST
Module 4: Introduction to Next Generation Sequencing
- FASTA vs
FASTQ, PHRED score
- FASTQ
library, single ended and paired
- FASTQC
Module 5: NGS Alignment
- Aligners:
Bowtie, BWA, HiSAT
- BAM files
- Samtools:
process and visualize BAM files
- Integrated
Genome Viewer: visualize alignments
Module 6: Calling Mutations
- Exercise:
generate BAMs
- Using
Varscan
- Visualizing
mutations and indels with IGV
- Larger
mutations: CNVs and translocation
- GATK
- Kiss&Splice:
calling mutations from RNA reads
Module 7: RNA-Seq
- Spliced
aligners (TopHat, STAR, HISAT)
- Finding
new transcripts (Cufflinks)
- Converting
bams to counts (GFF, HTSEQ-Counts)
- Finding
contaminants in human rnaseq vs. other genomes (unaligned vs H. Pylori)
Module 8: ChIP-Seq
- Exercise: align reads again
- Input reads
- Call Peaks (MACS)
- Find enriched motifs (HOMER)
- Upload Custom ENCODE tracks on Genome Browser
Module 9 (Short): Assembly
- Assembling a small bacterial genome with DNA reads with MIRA
- Classic DNA Assembly with Abyss or VELVET
- Assembling E2F3 gene with long DNA reads with Canu
- Assembling RNA-Seq transcripts with Trinity
*** end of R-free course ***
Module 10 (Long): (re)introduction to R
- Basic
commands up to sapply
- RStudio
- Scatterplots,
Boxplots, Violin Plots, Heatmaps
- RCircos
- Bioconductor
- Gene ID
conversion
- Genomic
Ranges
Module 11: Differential Expression Analysis
- Loading
counts
- Normalization:
RPM vs RPKM vs TPM vs Size Factors vs voom
- edgeR vs
DESeq2
- Comparing
two datasets
- Complex
Designs
- Confounding
Variables (cancer vs. normal with age difference)
Module 12: Microarrays
- Concept
- Three
steps: BG correction, normalization, summarization
- RMA vs.
MAS5
- Differential
Expression with LIMMA
- Comparing
microarrays with RNA-Seq
Module 13: Single Cell RNA-Seq
- Dropout
effects and biases
- Clustering
- Seurat
pipeline
- Cell Cycle
bias removal
- Differential
Expression and comparison with bulk RNA
Module 14: Differential Binding Analysis
- Estrogen
treatment with DiffBind package
- How to
assign peaks to promoters to genes (Granges)
- VULCAN
package?
Module 15: Pathway Enrichment Analysis
- Databases:
Gene Ontology, MSIGDB, Reactome, Biocarta, KEGG, Mapman
- Discrete
enrichments: TopGO package
- Continuous
enrichments: GSEA
- External
resources: DAVID, Gorilla
*** Extra Modules ***
Module 16: Coexpression Analysis in R
- Correlation:
Pearson, Spearman, Kendall
- Mutual
Information
- Partial
Correlation (A,B,C)
- Overlap
with ENCODE and MSIGDB data
- ARACNe
Module 17: Alternative Transcript Counters
Module 18: Detect gene Fusions
- RNA:
Tophat fusions
- DNA: big
translocation finders?
Module 19: Simple Machine Learning
- Predicting
Mutations with Gene Expression
- Glmnet,
lasso, gradient boost modeling, caret package
Module 20: Survival Analsyis
- Kaplan
Meier Curves
- Tests
- Multiple
groups
- Comparing
datasets
Module 21: Building an R plot with lattice
Module 22: Clustering analysis in R
- Hierarchical
clustering (hclust and pvclust)
- Treecut
and dynamic treecut
- Kmeans
- Principal
Component Analysis and TSNE
Module 23: DNA shape prediction in R
- The DNA
shape properties (MGW, HelT, PropT, Roll, EP)
- DNAShapeR
package
- Show the
shape of similar promoters (H.pylori project)