Category Archives: Uncategorized

7-way nested Venn Diagrams

3:21am, fifth coffee. Late hour inspired a nested 7-way Venn Diagram, a blob of shared miRNAs targeting E2F genes. A thing of terrible beauty, inside every human cell. Code below, a simple list rendered with the venn package.

names(mirnas) <- c("E2F1","E2F2","E2F3","E2F6","E2F7","E2F8","MYCN")

Bioinformatics Lab Course – Draft Structure

University of Bologna

Genomics Course

Bioinformatics Lab

Teacher: Prof. Federico M. Giorgi

Teaching Assistant: Dr. Chiara Cabrelle

Duration: 60 hours (15 modules of ~4 hours + optional extras)

Exam: Oral

The course aims at giving a practical overview of all the useful tools, approaches and techniques necessary for a competitive bioinformatician in 2019.

Module 1: Introduction to and testing of the working environment

  • Virtual Box
  • Linux Refreshment
  • Playing with a FASTA file: wc, grep, htop, regex, sed
  • EMBOSS suite
  • Remove/install programs using apt (htop)
  • Projects and Exercise structure

Module 2: Phylogenetic Sequence Analysis

  • Sequence databases: how to download sequences from NCBI
  • Building a phylogenetic multifasta (MYC family)
  • Multiple Sequence Alignment (Muscle, ClustalW, TCoffee)
  • Building a Phylogenetyc Tree (PHYLIP)
  • Phylogenetic GUI: MEGA

Module 3: Remote Homology Detection

  • BLAST introduction
  • Create, format and index a sequence database (BLAST formatdb)
  • BLASTN/BLASTP/TBLASTX with various options
  • Discover the organism of a mysterious sequence

Module 4: Introduction to Next Generation Sequencing

  • FASTA vs FASTQ, PHRED score
  • FASTQ library, single ended and paired

Module 5: NGS Alignment

  • Aligners: Bowtie, BWA, HiSAT
  • BAM files
  • Samtools: process and visualize BAM files
  • Integrated Genome Viewer: visualize alignments

Module 6: Calling Mutations

  • Exercise: generate BAMs
  • Using Varscan
  • Visualizing mutations and indels with IGV
  • Larger mutations: CNVs and translocation
  • GATK
  • Kiss&Splice: calling mutations from RNA reads

Module 7: RNA-Seq

  • Spliced aligners (TopHat, STAR, HISAT)
  • Finding new transcripts (Cufflinks)
  • Converting bams to counts (GFF, HTSEQ-Counts)
  • Finding contaminants in human rnaseq vs. other genomes (unaligned vs H. Pylori)

Module 8: ChIP-Seq

  • Exercise: align reads again
  • Input reads
  • Call Peaks (MACS)
  • Find enriched motifs (HOMER)
  • Upload Custom ENCODE tracks on Genome Browser

Module 9 (Short): Assembly

  • Assembling a small bacterial genome with DNA reads with MIRA
  • Classic DNA Assembly with Abyss or VELVET
  • Assembling E2F3 gene with long DNA reads with Canu
  • Assembling RNA-Seq transcripts with Trinity

*** end of R-free course ***

Module 10 (Long): (re)introduction to R

  • Basic commands up to sapply
  • RStudio
  • Scatterplots, Boxplots, Violin Plots, Heatmaps
  • RCircos
  • Bioconductor
  • Gene ID conversion
  • Genomic Ranges

Module 11: Differential Expression Analysis

  • Loading counts
  • Normalization: RPM vs RPKM vs TPM vs Size Factors vs voom
  • edgeR vs DESeq2
  • Comparing two datasets
  • Complex Designs
  • Confounding Variables (cancer vs. normal with age difference)

Module 12: Microarrays

  • Concept
  • Three steps: BG correction, normalization, summarization
  • RMA vs. MAS5
  • Differential Expression with LIMMA
  • Comparing microarrays with RNA-Seq

Module 13: Single Cell RNA-Seq

  • Dropout effects and biases
  • Clustering
  • Seurat pipeline
  • Cell Cycle bias removal
  • Differential Expression and comparison with bulk RNA

Module 14: Differential Binding Analysis

  • Estrogen treatment with DiffBind package
  • How to assign peaks to promoters to genes (Granges)
  • VULCAN package?

Module 15: Pathway Enrichment Analysis

  • Databases: Gene Ontology, MSIGDB, Reactome, Biocarta, KEGG, Mapman
  • Discrete enrichments: TopGO package
  • Continuous enrichments: GSEA
  • External resources: DAVID, Gorilla

*** Extra Modules ***

Module 16: Coexpression Analysis in R

  • Correlation: Pearson, Spearman, Kendall
  • Mutual Information
  • Partial Correlation (A,B,C)
  • Overlap with ENCODE and MSIGDB data
  • ARACNe

Module 17: Alternative Transcript Counters

  • Salmon
  • Kallisto

Module 18: Detect gene Fusions

  • RNA: Tophat fusions
  • DNA: big translocation finders?

Module 19: Simple Machine Learning

  • Predicting Mutations with Gene Expression
  • Glmnet, lasso, gradient boost modeling, caret package

Module 20: Survival Analsyis

  • Kaplan Meier Curves
  • Tests
  • Multiple groups
  • Comparing datasets

Module 21: Building an R plot with lattice

  • Canvas
  • Axes
  • Objects

Module 22: Clustering analysis in R

  • Hierarchical clustering (hclust and pvclust)
  • Treecut and dynamic treecut
  • Kmeans
  • Principal Component Analysis and TSNE

Module 23: DNA shape prediction in R

  • The DNA shape properties (MGW, HelT, PropT, Roll, EP)
  • DNAShapeR package
  • Show the shape of similar promoters (H.pylori project)